gical Bulletin | 


Hiary Heison, Editor 
Kansas State Universsiy 





CONTENTS 


with vere Wechsler Intelligence Scales for Adults: 1955-60 
AE Pes Ree aeene S, tes \.... Wirson H. Guertin; 
ve Rawin, GEORGE HH. Frang) ann Ciayron Ev LApD 


+ Renearch Uses of Doll Play... 
JSC RAR Sela , Harry Levin AND ELINOR WARDWELL 


ie 


ALFRED mye GoLOMAN 


Reliability of a ee Differential Recognition-Thresh- 
+ sees ee. DONN BYRNE AND JOAN HOLCOMB 


on “The Paramorphic Representation of Clinical Judgment” 
.Joe H. Warp, Ja. 


it of the Independent Contributions of Predictors. 
.Paut J. HOFFMAN 


Published Bimonthly by the 
American’ Psychological Association 


Vou. 59, No. 1 





Consulting Editors 


Q. McNemar 
Stanford University ee 
L. J. Postman mig oe “3 
University of California, Berkeley ss 
J. R. Rorrer 
Ohio State University 
PRD S, B, SELLS 
of Southern California Texas Christian University 
DLTZMAN W.A. Watson, Jr. 
of Texas Bryn Mawr College 


f Bulletin contains evaluative reviews of research literature a 


orts of original research or original theoretical articles. 
uld be sent to the Editor, Harry Helson, Department of P 
t University, Maphattan, Kansas. 


5). Special attention should be given to the section la : 
ee references (pp. 50-60), since this is a particular source of — 
fiews of research Rereture: All copy must be double spe 


thy type copy; Lthor’ 8 mame should appear only on title 
id mimeographed copies are not acceptable and will ae*-be con- 
figures are Sea: for publication; duplicate figures may be pho- 





HELEN Orr 
Acting Production Manager 


t—including subscriptions, orders of back issues, and changes ae oe 
tee to the American Psychological Association, 1333 Six. 


rN J, Washington 6, D.C. .\ddress changes must reach the Subscrip- . 


e tenth of the month to tale effect the following month. Undeliy 

rom ‘address changes will sot be replaced; subscribers sheuld mi ie 
they will guarantee second-class forwarding postage. Other claims 

must be made within four months of publication, ‘ 

¢ $10.00 (Foreign $1.50). Single copies, $2.00. 








PUBLISHED BIMONTHLY BY: 
MICAN PSYCHOLOGICAL ASSOCIATION, INC. 


cad Menasha, Wisconsin 
; and 1333 Sixteenth Street 'N.W,, Weshingtga:#, D.C. 











VoL. 59, No. 1 


JANUARY 1962 


Psychological Bulletin 


RESEARCH WITH THE WECHSLER INTELLIGENCE 
SCALES FOR ADULTS: 
1955-60" 


WILSON H. GUERTIN 


University of Florida 


ALBERT I. RABIN 


Michigan State University 


GEORGE H. FRANK AND CLAYTON E. LADD 


University of Miami 


Two important events took place 
since the bulk of the material for our 
previous review (Guertin, Frank, & 
Rabin, 1956) was gathered and or- 
ganized. The first is the publication 
of the manual for the revised WB,’ 
known as the WAIS (Wechsler, 
1955); the second was the appear- 
ance of a new, rewritten, and reor- 
ganized edition of Wechsler’s Adult 
Intelligence (1958). 

Although the manual was men- 
tioned in our previous review, the new 
test it introduced—the WAIS—had 
not yet become the popular instru- 
ment it is today. It seems to be re- 


1 Through July 1960. 

2 The abbreviation, WB, will be used 
throughout to indicate the Wechsler-Bellevue 
Intelligence Scale, Form I. Form II will be 
designated WB II, while WAIS signifies the 
Wechsler Adult Intelligence Scale. The names 
of the subtests also appear in abbreviated 
form throughout the paper. The single letters 
I, C, A, D, S, and V stand for the Ver- 
bal subtests: Information, Comprehension, 
Arithmetic, Digits, Similarities, and Vocabu- 
lary, respectively. The two-letter combina- 
tions PA, PC, OA, BD, and DS correspond to 
the following Performance subtests: Picture 
Arrangement, Picture Completion, Object As- 
sembly, Block Designs, and Digit Symbol, 
respectively. FS, VIQ, and PIQ stand for Full 
Scale, Verbal IQ, and Performance IQ, respec- 
tively. 


Indiana University 


placing the old WB as a research tool 
and as a clinical and assessment 
device for many good _ reasons. 
For reviews of the WAIS see Buros 
(1959). The present review covers 
work done with both instruments for 
it spans a period of transition. 

In the closing summary of the 
previous review, we expressed the 
hope for ‘‘the creation of a newly 
standardized instrument, similar in 
structure to the WB, but not suffer- 
ing from the numerous weaknesses.”’ 
The WAIS, in many respects, is the 
answer to this wish. A fairly rich 
harvest of research with this method 
is critically considered in the follow- 
ing pages. It may be added, in agree- 
ment with Wittenborn (1957), that: 
There is a refreshing trend away from gross 
empirical validations which required that 
tests predict the diagnostic decisions of psy- 
chiatrists or psychologists. Instead, there 
seems to be an emphasis on the conceptual 
validity of 
ment (p. 3 


the procedures employed in assess- 
31 

The general outline of the present 
review and its organization are quite 
similar to our previous reviews 
(Guertin et al., 1956; Rabin, 1945; 
Rabin & Guertin, 1951). The 
amount of material covered under 
each rubric differs however, for some 


1 
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currents have run dry, while previ- 
ous trickles have expanded markedly. 
The bibliographical coverage is selec- 
tive in view of differences in rele- 
vance, quality, and significance of 
the various researches reported in 
the literature.’ 


As A MEASURE OF INTELLIGENCE 
Reliability 

An inspection of Wechsler’s tables 
(1958, pp. 102-3) suggests that the 
WAIS IQs and verbal subtests are 
slightly more reliable than compa- 
rable WB IQs and subtests, but that 
the performance subtests (possibly 
excepting DS) have about the same 
reliability coefficients on both tests. 
Perhaps this indication of increased 
reliability with the WAIS has cur- 
tailed the number of studies report- 
ing test-retest or split-half reliabilities 
for this test as only one has been pub- 
lished thus far. Over long periods 
ranging from 1 to 5 years and using 


bright ‘‘normals’”’ (Bayley, 1957) or 


psychiatric patients (Armitage & 
Pearl, 1958), the WB has yielded 
test-retest correlations similar to 
those found in earlier reliability 
studies, i.e., .77-.95. 

Coons and Peacock (1959), using 
24 mental hospital patients, obtained 
test-retest correlations for all three 
WAIS IQ scores of .96 or better, and 
the standard errors of measurement 
were consistent with those obtained 


* A supplementary bibliography along with 
the references covered by this review aims at 
complete coverage of research articles em- 
ploying the adult Wechsler scales. This sup- 
plementary bibliography has been deposited 
with the American Documentation Institute. 
Order Document No. 6843 from ADI 
Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress; Washing- 
ton 25, D. C., remitting in advance $1.25 
for microfilm or $1.25 for photocopies. Make 
checks payable to: Chief, Photoduplication 
Service, Library of Congress. 


with the standardization sample. 
From this it was inferred that: 

IQ changes on retest with different examiners 
of more than 6 points can be attributed with 


reasonable confidence to changes in the mental 
state of the patient. 


Yet, the practice effects or at least 
increments in IQ scores at the time of 
the second testing were 2.6, 8.6, 
and 5.0 points for VIQ, PIQ, and 
FSIQ, respectively. Consequently, 
the quoted inference needs a qualifi- 
cation, such as “after appropriately 
adjusting for practice effects.’’ Test- 
retest differences were not only 
greater but also more variable for the 
PIO than for the VIQ or FSIQs; 
thus, it was concluded that ‘‘the 
Verbal scale is a better indicator of 
the level of the original Full Scale 
performance than is the Performance 
Scale 1Q.”" At the subtest level, the 
test-retest reliabilities are generally 
higher than the split-half reliabilities 
reported in the WAIS manual (1955). 
D had the lowest reliability of all 
subtests with a .84; the other Verbal 
subtests (excepting C with a .89) 
were .94 or better. The Performance 
subtests averaged .88, suggesting to 
the authors that the Verbal subtests 
are more reliable than the Perform- 
ance subtests; however, one should 
remember that the practice effects 
were much more variable on the 
Performance subtests, which would 
reduce the test-retest reliability co- 
efficients. 


Comparative Validity 


WB II and WISC. Earlier com- 
parisons of the WB and WBII dis- 
closed that practice effect was ap- 
preciably greater when the WB II 
was administered first. Thus, a very 
interesting and mystifying phenom- 
enon confronted and worried Wechs- 
ler workers until Barry, Fulkerson, 
Kubla, and Seaquist (1956) failed to 
find a significant interaction between 
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practice effect and the form of the 
WB administered first. Furthermore, 
they reported lack of equivalence 
between forms for entirely different 
subtests (S, OA, and DS) than re- 
ported by earlier workers. Their 
equivalent form reliability coeffi- 
cient of .71 is consistent with earlier 
findings and is rather high since their 
range of talent (intelligence) was 
only half that of an unrestricted 
sample. 

Findings of earlier comparisons 
between the WISC and WB were 
confirmed by Price and Thorne 
(1955). Their sophisticated statisti- 
cal analysis of data disclosed a 
slightly lower WB FSIQ and VIQ, 
while PIQ was slightly higher than 
for corresponding WISC scales. Cor- 
relation between the FSIQs was very 
high for their 11.5-year-old sample 
(.89) and moderate for their 14.5- 
year-old sample (.78), but range of 
talent was considerably lower in the 
older group. 

WAIS. Cole and Webela (1956) 
reported a comparison of the WB 
and WAIS, but their restricted range 
of talent and incomplete counter- 
balancing of the form of the test with 
order of administration prevent any 
findings from being more than sug- 
gestive. Goolishian and Ramsay 
(1956) also were interested in the 
equivalence of the new WAIS and 
the WB, so they studied the two 
arrays of test scores in their hospital 
files. While the design employs 
different subjects for the two test 
scores, thus permitting the operation 
of sampling biases, the investigators 
employed a large NV. They failed to 
find the extreme differences noted by 
Cole and Webela, but five subtests 
showed significant differences be- 
tween the two tests. Neuringer’s 
careful study (1956) showed FSIQ 
and PIQ were higher for the WB, a 
finding echoed in a more subjective 


report by Sinnett and Mayman 
(1960). Dana’s results (1957a), based 
upon a study of only the Verbal 
scales, revealed no significant differ- 
ences for any of the subtest com- 
parisons, a finding that is quite 
different from that of Cole and 
Webela. Then, in support of the 
large differences between forms found 
by Cole and Webela; Karson, Pool, 
and Freud (1957) reported significant 
differences for five subtests, also pro- 
viding confirmation of some of the 
Goolishian-Ramsay findings. Light 
and Chambers (1958) found, with 
defectives, that the WAIS, VIQ, and 
FSIQ were significantly higher than 
for the WB. Correlation of the FSIQ 
was .77 for their restricted range of 
talent sample. Garfield (1960) found 
BD to be ninth in WAIS subtest 
order of difficulty as compared with 
third place for the WB BD. 

It would appear that the only con- 
sistent finding with samples of aver- 
age or higher intelligence is higher 
scores on BD, DS, PIQ, and FSIQ 
for the WB; and there is little agree- 
ment as to which of the verbal sub- 
tests are lower for the WB, if any. 
Only Neuringer’s study (1956) had 
all the features of ap- 
propriate range of talent, sufficient 
N, unbiased samples, and appropriate 
counterbalancing to test the equiva- 
lence of the WB and WAIS. After 
correcting for range of talent, Neu- 
ringer’s correlations for VIQ, PIQ, 
and FSIQ, respectively, 89, 
.44, and .77—hardly satisfactory for 
“equivalent form”’ reliability. 

Other tests. Sines (1958) reported 
correlations of .77, .78, and .79 be- 
tween the Shipley-Hartford and the 
WB FS scores for three samples and 
provides regression equations for 
predicting WB FSIQ from the Shipley. 
Three tests from the Army Classifica- 
tion Battery correlated .60 to .81 
with the WB FS scores (Montague, 


necessary 


were 
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Williams, Lubin, & Geiseking 1957), 
while Murphy and Langston (1956) 
obtained a .83 between the WB FS 
score and the Army Classification 
Battery, Area Aptitude I Test. 
Higher correlations between the Re- 
vised Beta and WAIS (.81 and .83 
for Negro and white prisoners) were 
found by Panton (1960). 

Sterne (1960) reported a correla- 
tion of .84 between the Ammons Full 
Range Picture Vocabulary Test 
(FRPV) and the WAIS FSIQ for a 
sample of older medical patients. 
Allen, Thornton, and Stenger (1956), 
using college students with a mark- 
edly restricted range of talent, 
tained a correlation of only .46 be- 
tween the FRPV and the WB FSIQ. 
Fisher, Shotwell, and York (1960) 
found correlations between FRPV 
and various WAIS scores ranging 
from .36 to .79 with defectives. 
Borgatta and Corsini (1960) reported 
correlations between WAIS FS scores 


ob- 


and four forms of their Quick Work 
Test of .75 to .83, with the observa- 
tion that coefficients are attenuated 
by reduced range of talent. 


Rabin- 
owitz (1956) compared the Kent 
EGY with the WB FSIQ and found a 
correlation of .69 for hospitalized 
psychiatric patients with a normal 
range of intelligence. 

Those interested in Raven's Pro- 
Matrices often use the 
Wechsler for comparative purposes. 
Hall (1957a) found a .72 correlation 
with the WAIS FS while 
Stacey and Gill (1955), working with 
the restricted range of talent found in 
samples of adult defectives, reported 
a correlation of .68 with the WB 
FSIQ. Urmer, Morris, and Wend- 
land (1960), and Moya-Diaz and 
Matte-Blanco (1953-55) also studied 
the matrices and Wechsler scores. 
The latter found the fairly 
equivalent but noted that anxiety 
and cultural factors were more im- 


gressive 


scores, 


tests 


portant determinants of WB scores 
than for scores on the matrices. 
Confirming. this, Levinson (1959) 
employed a sample of 80% foreign 
born with two age ranges. Matrices 
correlated with the WAIS 
FSIQ .65 for his 60-69 year olds and 
.40 for his 70-79 year olds. As ex- 
pected, he found a negative correla- 
tion between WAIS performance and 
age, which was greater in the older 
group. Had he used WAIS weighted 
scores instead of 1Q, he would have 


scores 


obtained higher and more appropri- 


ate correlations with the matrices. 

Hall (1957b) found the WB FS 
scores and Wechsler Memory Scale 
correlated .75 and concluded there 
was a large overlap in what the two 
tests measure. Strong (1959) found a 
mixture of WAIS and WB FSIQs 
correlated .63 with the Ohio Literacy 
Test for psychiatric patients. One 
would expect a higher correlation for 
weighted score than IQ since the Ohio 
Literacy Test has no correction for 
deterioration with age. 

Summary. The studies reviewed 
in this section, when compared as a 
whole with those covered in the last 
review, are very disappointing. Not 
only have the investigators failed to 
learn from others’ mistakes, but 
there seems to be little tendency to 
design critical and conclusive studies 
to resolve conflicting findings re- 
ported earlier. 

Range of intelligence in the sample 
is often ignored, frequently not re- 
ported, and only one correlational 
study employed a correction for re- 
stricted range of talent (intelligence) 

It seems useless to remind investi- 
gators that equivalence between tests 
depends upon both correlation and 
differences in mean scores, but we 
would be remiss were we not to repeat 
this again. Somewhat encouraging is 
the tendency seen to use the more 
sophisticated approaches of analysis 
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of variance and regression equations 
for specifying IQ. 


Short Forms 

The new WAIS has given a fresh 
impetus to studies involving short 
forms. In an early article concerning 
the WAIS, Doppelt (1956) decided 
upon the tetrad short form (A, V, 
BD, and PA) consisting of the two 
subtests which correlated most highly 
with their respective scale scores in 
Wechsler’s standardization popula- 
tion. Doppelt presented a regression 
equation method of computing the 
FS score which was compared by 
Himelstein (1957b) with simple pro- 
rating. Himelstein found the total 
scores computed by the two methods 
correlated .99 and since the means 
were identical, concluded that the 
clinician may feel free to use either 
method. 

The Doppelt article was the partial 
stimulus for a rash of studies (Clay- 
ton & Payne, 1959; Fisher & Shotwell, 
1959; Himelstein, 1957b, 1957c; Olin 
& Reznikoff, 1957; Sines & Simmons, 
1959; Sterne, 1957: Whitmyre & 
Pishkin, 1958) reporting the applica- 
tion of Doppelt’s WAIS short form to 
patient populations and generally 
concluding that this abbreviated 
scale provided about as valid an esti- 
mate of the FS score for heterogene- 
ous psychiatric subjects as for the 
standardization subjects. While cor- 
relations range from .92 to .97, it 
must be remembered that they are 
exaggerated they represent 
correlation of parts with the whole. 
Findings for samples with restricted 
range of talent gave lower 
form-FS correlations for homeless 
men (Levinson, 1957), mental defec- 
tives (Clayton & Payne, 1959), and 
students (Allen et al., 1956). Both 
Levinson’s and Himelstein’s com- 
ments (1957a) ignore the constricting 
effect of the reduced range of talent 


since 


short 


in Levinson’s sample on the size of 
the obtained correlation, which, when 
corrected, from .87 to .92. 
Sterne (1957) similarly found a lower 
correlation with organics but the ob- 
tained coefficient is highly unreliable 
with N=12. 

Using a similar formula to that 
developed by McNemar for the WB, 
Maxwell (1957) determined the cor- 
relation of all possible two, three, 
four, and five subtest combinations 
with the WAIS FS for the 300 sub- 
jects in the 25-34 age group of the 
standardization population. She 
concluded: (a) that the accuracy of 
abbreviated function of 
the number of subtests included; (0) 
that while short verbal are 
generally better than performance 
as predictors of FS scores, a 
combination of both verbal and 
performance subtests is best; (c) that 
the best WB and WAIS abbreviated 
scales are not composed of the same 
subtests; and (d) that WAIS short 
forms are more highly correlated with 
the FS than are the WB short forms. 
The last conclusion was challenged 
by Howard (1958) contends 
McNemar made an error and under- 
estimated the 
WB abbreviated and FS. 
Iloward (1959) also reported finding 
higher WAIS short form-FS correla- 
tions in a group of heterogeneous 
psychiatric patients than Maxwell 
found in the standardization sample, 
but he recognized that ‘‘the differ- 
ences appeared to result from the 
greater variance of the patient sam- 
ple.” 

Three 


rises 


scales is a 
scales 


scales 


who 
correlations between 


scales 


within the last 5 
years have considered the usefulness 
of WB II abbreviated scales for 
employee selection (Sloan & New- 
man, 1955), with outpa- 
tients (Schneyer, 1957), psychotics 
and students (Caldwell & Davis, 
1956). 


studies 


alcoholic 
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Special Populations and A pplications 

Intelligence as a function of age. 
Bayley (1957) concerned herself with 
the growth of intelligence between 16 
and 21 years of age in an extension 
of the now famous Berkeley Growth 
Study. In general, subjects improved 
with each testing regardless of intel- 
lectual or educational level. Certain 
individuals, however, appeared to 
have reached their asymptote by 16 
or 18 while others continued to de- 
velop until 21 or older. Although 
acknowledging possible practice ef- 
fects, Bayley did not feel this totally 
accounted for the increments in 
performance. 

Concerned with the encroachments 
of old age in a randomly selected 
probability sample in Delaware, 


Whiteman and Jastak (1957) admin- 
istered three subtests of the WB to 
1,980 persons and found little decline 
with age on C, moderate decline on 
PC, and marked decline on DS begin- 


ning at age 35. These differential 
deficits in performance accruing with 
age were interpreted as ‘‘a decline in 
certain group and specific factors— 
conative, perceptual, and motoric in 
nature—rather than as a decline in 
general intellectual ability per se.” 
Similar interpretations of the WAIS 
standardization data were made by 
Doppelt and Wallace (1955) and 
Wechsler (1958). Comparing the WB 
standardization population with the 
WAIS standardization population, 
Wechsler (1958) noted that the best 
overall WAIS test scores occurred in 
the 25-29 age interval rather than 
the 20-24 age interval found for the 
WB standardization. Also, the gen- 
eral rate of decline was said to be less 
for the WAIS than for the WB up to 
age 50. 

Doppelt and Wallace (1955) found 
that allowing the elderly subjects un- 
limited time made very little differ- 
ence in their scores. The WAIS 


standardization population scores be- 
gan to decline with aging much sooner, 
and decrement was much more 
marked on the Performance subtests 
than on the Verbal ones. The WAIS 
Verbal subtests hold up fairly well 
until about 70 years of age at which 
time all subtest performances decline 
rapidly with age. E/isdorfer, Busse, 
and Cohen (1959) questioned the 
representativeness of the WAIS Kan- 
sas City aged sample (Doppelt & 
Wallace, 1955); however, when 162 
volunteer subjects from the Piedmont 
section of North Carolina consistently 
(82%) manifested a superiority of 
VIQ over PIQ. This Verbal superi- 
ority remained even when sex, race, 
socioeconomic, intelligence, and 
mental health differences were ana- 
lyzed separately. It is noteworthy 
that the VIQ-PIQ discrepancy for 
the entire sample is more attributable 
to an elevation of the VIQ (106.5) 
above the norm than to a depression 
of the PIO (98.5). It may be that 
their volunteers show a greater rela- 
tive elevation of verbal skills than 
the WAIS standardization sample. 
Loranger and Misiak (1960) found 
DS performance of a group of aged 
females comparable to that of the 
Kansas City standardization sample. 

Sex differences. In the WAIS 
standardization population there 
were consistent but negligible differ- 
ences in Verbal Performance and FS 
scores in favor of the males (Doppelt 
& Wallace, 1955; Wechsler, 1958). 
Eight of the 11 subtests showed sig- 
nificant sex differences with men 
doing better on five (I, C, A, PC, and 
BD) and women better on three (S, 
V, and DS). Apparently the rise and 
fall of the Mental Deterioration 
Index has had little effect on Wechs- 
ler’s habit hierarchy, for he now 
proposes a new ‘‘WAIS masculinity- 
femininity (MF) score’’ composed of 
the F total (V+S+DS) subtracted 
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from the M total (I+A+PC). In 
the Plant and Lynd (1959) norms for 
361 college freshmen there were no 
statistically significant sex differ- 
ences on any of the WAIS IQs but 
subtest scores were not reported. In 
the Berkeley Growth Study (Bayley, 
1957), males were superior on the 
Verbal scale, while females were 
higher on the Performance scale; 
however, there was no evidence for 
an earlier intellectual maturation of 
females. An unpublished thesis by 
Miele (1958) deals with sex differ- 
ences on the WAIS. 

Educational and vocational applica- 
tions. The general intellectual level 
of college students has long been of 
interest. Plant and _ Richardson 
(1958) recently reported a mean WB 
FSIQ of 116.5 for college freshmen 
volunteers. Wechsler (1958) reported 
a very similar mean. Plant and Lynd 
(1959) found correlations of Verbal, 
Performance, and FS WAIS weighted 
scores with grade point average for 
the freshman year were .58, .31, and 
.53, respectively, which were as good 
or better than similar correlations 
for the ACE. Their normative data 
reveal an expected restriction in 
range of talent. The WB VIQ for 
engineering students has been re- 
ported (Wechsler, 1958) to be not 
only superior to the PIQ but also 
more highly correlated (.41 vs. .08) 


with college grades. Weisgerber 
(1955) concluded that Diamond’s 
factor analytically based scoring 


method designed for vocational coun- 
seling with the WB was not as useful 
as the VIQ for predicting academic 
success of engineering students. At 
an even higher educational level, 
Holt and Luborsky (1959) have indi- 
cated their surprise at finding the 
WB VIQ to be one of the better 
predictors of performance in psychi- 
atric residency training in spite of the 
test’s ceiling. Correlations between 


the WB VIQ and supervisor-peer 
ratings on diagnosis, therapy, ad- 
ministration, management, and over- 
all competence ranged from .27 to 
.47; even the correlations with em- 
pathy, interest, sensitivity, firmness, 
etc. were in the .30s. 

A very interesting and thorough 
study of the relationship between 
intelligence (WB) and rated crea- 
tively in 64 chemists engaged in 
industrial research has been reported 
by Meer and Stein (1955). Not too 
surprisingly, when the entire group 
was considered there were generally 
positive findings although not always 
significant relationships among edu- 
cation, intelligence, and creativity. 
Their probing analysis, however, led 
to the tentative conclusion that: 
Where equal opportunity is available higher 
IQ scores beyond a certain point [approxi- 
mately Percentile 95] have relatively little 
significance for creative work. 


Considering the role of intelligence 
in managerial positions, Balinsky 
and Shaw (1956) found their unique 
sample had a higher WAIS VIQ (125) 
than PIQ (117) and, after correlating 
the IQs and subtest scores with over- 
all performance ratings by superiors 
and peers, concluded that: 

Apparently verbal intelligence and especially 
arithmetical ability are important factors in 
the performance of the executive personnel. 


While one 


authors’ 


might argue with the 
phraseology—“‘important 
factors’’—since the data indicated 
only one (A) of the 11 subtests 
yielded a significant correlation, the 
V1Q-performance rating correlation 
of .32 was significant at the .05 level. 
Another study, by Dunnette and 
Kirchner (1958), provides some con- 
firmation of this relationship between 
intelligence and managerial effective- 
ness. 
Cultural 
and_ ethnic 


translations, 
(1959) 


influences, 


groups. Bloom 
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recently compared 67 student nurses 
in Missouri with 67 in Hawaii using 
the V and PC subtests of the WAIS. 
The Missouri nurses obtained higher 
scores on both subtests (significant 
at .01 level only for V), and seven of 
the eight hypotheses about ecologic 
difficulty of PC items were con- 
firmed. In a similar fashion, Breiger 
(1956) compared the WB PA per- 
formance of 30 United States Cau- 
casians, 20 Nisei, and 10 German 
refugees. The three groups matched 
on IQ, education, urban-rural resi- 
dence, and bilingualism, scored ap- 
proximately the same on this subtest 
when evaluated in the usual manner, 
but a content analysis of stories re- 
lated to their own arrangement of 
the ‘Flirt’? and ‘‘Taxi’’ items re- 
vealed marked differences. Signifi- 
cantly more Caucasians than Nisei 
project romantic implications into 
the Flirt sequence and abnormal sex 
behavior into the Taxi arrangement. 
Sullivan (1957), in testing 15 and 16 
vear olds in Newfoundland, found 
rural subjects were handicapped on 
the WB. 

Numerous applications of the WB 
and WAIS to foreign populations are 
evident during this 5-year period, 
and most of these investigators have 
found it necessary to make modifica- 
tions of varying degrees to the test to 
correct for cultural biases. New 
translations of Wechsler’s third edi- 
tion have been made into French 
(Chagnon, 1955) and German 
(Wechsler, 1956). The WB has been 
translated into Danish and tried out 
with institutional cases (Mogensen, 
1958 unpublished). Italian prisoners 
have been tested (Lazzari, Ferrecuti, 
& Rizzo, 1958). Priester (1957), and 
Priester and Kukulka (1958) pre- 
sented a method of comparing 
HAWIE (German WAIS) subtest 
scatter with Wechsler’s diagnostic 
signs. He also compared the HAWIE 


with the HAWIK (German WISC) 


and the Binet-Bobertag, finding them 
sufficiently comparable to be con- 
sidered parallel tests. Cultural as- 
pects of the WAIS in Canadian sub- 
jects (Hopkins, 1957) and in British 
mental patients (Robertson & 
Batcheldor, 1956) have been reported. 
The latter authors concluded the 
British subjects were better on liter- 
ary and poorer on scientific I and V 
items than the American standardiza- 
tion sample; accuracy rather than 
speed characterized the British ap- 
proach. 

More directly to the point were a 
series of discerning articles by Levin- 
son (1958, 1959) who expounds the 
thesis that reliable and valid differ- 
ences between VIQs and PIQs are not 
necessarily the result of pathology 
but may reflect the deviant values 
associated with specific subcultures. 
He substantiates his case by citing 
the WAIS scores of 64 Yeshiva Uni- 
versity students who had been in- 
doctrinated with the traditional 
Jewish cultural values that place 
great stress upon verbal accomplish- 
ments and discount manual skills. 
This group obtained a mean VIQ of 
125.6 but a mean PIQ of only 105.3, 
with 97% of the subjects having a 
higher VIQ than PIQ. 

A well-designed investigation com- 
paring the youngest WAIS standard- 
ization group with 100 Navaho 
Indians of comparable age, sex, 
education, occupation, and_ rural- 
urban residence (Howell, Evans, & 
Downing, 1958) afforded a striking 
contrast with the studies of Jewish 
students. The Navaho group ob- 
tained a VIQ of 84.0 and a PIQ of 
95.4, which were significantly lower 
than those of the standardization 
group. Another group, however, 
which also stresses manipulative skills 
more than verbal accomplishments, 
the Southern Negro, showed a slight 
and nonsignificant tendency for the 


WB II VIQ to be higher than PIQ 
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(Davis, 1957). This was true for both 
his mental patients with various 
diagnoses and hospital employees, 
but perhaps most significant were 
the absolute levels (mean FSIQ 68 
for the employees and 67 for the pa- 
tients). A question concerning edu- 
cational background of these groups 
arises, and a supplementary investi- 
gation indicated that both groups 
compared favorably on amount of 
education with the 1950 
figures for nonwhites in Florida. 
Scarborough (1956) compared 40 
venereal diseased patients with 118 
control subjects in a complex, poorly 


census 


designed study and derived inconclu- 
sive results. His findings suggest that 
Southern Negroes do less well on the 
WB (IQs ~80) than Southern whites 
(1Qs=90) and that the patients of 
either race do almost as well as their 
own control group. The Negro sub- 
jects in this and in the Davis study 
did relatively well on OA but poorly 
on D and DS. Just why Scarborough’s 
Negro subjects from Georgia should 
average almost 13 IQ points higher 
than Davis’s Negro subjects from 
Florida is puzzling. 

Some very interesting information 
about the intellectual distribution of 
3,594 unwed mothers placing their 
children for adoption in Minne- 
sota was provided by Pearson and 
Amacher (1956). The mean IQ was 
100.19 with a standard deviation of 
18.36. Although approaching a nor- 
mal distribution, there were fewer 
cases than expected between IQ 83 
and 91. The authors hypothesized 
that these deviations were due to a 
greater proportion of mothers falling 
at the extremes of the intellectual 
continuum placing their babies for 
adoption because of necessity or 
social pressure, while dull normal 
mothers more commonly keep and 
rear their illegitimate children. It is 
noteworthy that ‘‘repeaters’’ ob- 
tained a mean IQ of 93.3. 


Summary. Intellectual growth, as 
defined by improved |test perform- 
ance on the WAIS continues in our 
culture until 25-30 years of age, but 
wide individual differences exist in 
the age of maturation ranging from 
the early or middle teens to the late 
twenties or older. Shortly after the 
intellectual peak, however, aging 
makes its first encroachments upon 
perceptual and psychomotor tasks; 
only considerably later does it ap- 
preciably affect verbal skills. Whether 
Wechsler’s (1958, p. 143) conceptual 
distinction between 
and ‘‘wisdom”’ 


“‘intelligence’”’ 
(defined by reference 
to the ability of the old sage to cope 
with life’s problems) is useful re- 
mains to be seen, but an obvious im- 
plication is that a test for each con- 
cept is needed at least to evaluate the 
hypothesis that both are worthwhile. 
Although sex differences have been 
demonstrated fairly consistently on 
certain subtests, IQs are usually 
comparable. In addition to age and 
sex, a variety of environmental influ- 
ences, such as subcultural background 
and values, education and vocational 
history, socioeconomic conditions, 
etc., may produce diverse and dra- 
matic effects upon intelligence test 
scores. 

Thus, the conclusion of this section 
remains essentially the same as in 
previous reviews although valuable 
new data has been added, namely, 
that a number of variables besides 
pathology affect Wechsler perform- 
ance and consequently must be 
controlled or accounted for in ade- 
quate analyses. Clearly, no one can 
criticize Dunnette and Kirchner’s 
(1958) plea for validity studies in 
specific vocational stiuations instead 
of reliance upon the assumed intrinsic 
validity of a test. 


Refinements and Critiques 


Administration and scoring. In 
contrast to the last review, only one 
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paper is concerned with item order 
and difficulty of the WB. Rubin- 
Rabson (1956) points out the time 
boundedness of previously established 
item orders and observes the unde- 
sirable ‘‘tendency for items to cluster 
in groups of similar difficulty, [and] 
an abrupt augmentation of difficulty 
from group to group.” 

Two important investigations of 
the effect of administration and test 
taking attitudes were _ reported. 
Masling (1959) slyly coached his 
accomplices, as he appropriately calls 
them, for ‘‘warm’”’ and ‘‘cold’’ roles 
to be played when tested by unsus- 
pecting experimenters. Utilization 
of some memorized answers, taped 
sessions, and a set of judges demon- 
strated that the warm role enhanced 
the score in three ways: experimenters 
used more reinforcing comments, 


gave more opportunity to clarify and 
correct answers, and scoring was more 
lenient toward the warm subjects. 


However, these statistically signifi- 
cant differences were small. 

Nichols (1959) manipulated ego 
involvement and success experience 
for college students taking the WB. 
He concludes: 
differences in test taking attitude on the part 
of the S and minor differences in testing pro- 
cedure on the part of the £ do not materially 
affect intelligence test scores. [He adds this 
important caution] However, since the sub- 
jects used in this study were all intelligent 
students who are used to taking tests and 
doing their best, the results may not be di- 
rectly applicable to clinic and _ hospital 
groups. 

We would add: or to children. 

The effects of a trusting or skepti- 
cal attitude in student nurses upon 
the WAIS S and PC subtests were 
investigated by Wiener (1957) who 
hypothesized that a distrustful atti- 
tude would increase the ‘‘no similar- 
ity’ or “nothing missing’’ responses 
and thus interfere with performance 
on these subtests. The attitudes were 
measured by a questionnaire, and 


distrustfulness was also presumably 
reinforced or induced by special in- 
structions. The more distrustful 
students on the questionnaire dis- 
played a stronger tendency to make 
the predicted distrustful comments 
and were lower on both S and PC 
subtests. The experimental instruc- 
tions, however, did not depress the 
subtest score but did increase the 
number of comments suggestive of 
distrust. The results are interesting 
and suggestive, but it should be 
noted that the NV was small and that 
only difference scores (S—V and 
PC—V) were reported. 

Guertin (1959) found that various, 
controlled background noises had no 
effect on D performance with a group 
of chronic psychotics. But, again, 
distraction would be more likely for 
subjects who maintain more interest 
in their surroundings, so generaliza- 
tion about the unimportance of noise 
during D administration is most 
hazardous. Blackburn and Benton 
(1957) suggest a more reliable ad- 
ministration and scoring procedure 
for D. They present reliability data 
from several populations and give 
conversion tables. Briggs’ study 
(1960) is reassuring in that only DS 
results were appreciably affected 
when the subject was forced to 
manipulate with his nondominant 
hand. Plumb and Charles (1955) 
studied scoring disagreements to C 
responses and found that experts as 
well as graduate students disagreed 
significantly. Olin (1958) presents 
tables taking into account the sub- 
ject’s age group when prorating IQ. 
Clinicians making prorations of IQ 
in the aged from short forms should 
note that unless Olin’s procedure is 
followed, they are introducing ap- 
preciable error in estimating IQ. 

Factor analyses. Davis (1956) 
derived 10 factors from the WB sub- 
tests, many more than previously 
reported. His use of a narrow range 
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of talent emphasizes test or methods 
factors as opposed to trait factors 
and increases dimensionality. Saun- 
ders (1959a) observed that Davis 
used a nonuniform procedure in ob- 
taining intercorrelations that also 
could account for the unexpectedly 
large number of factors. Not stopping 
with criticism, Saunders devised a 
crucial test of the dimensionality of 
the WAIS. He divided the subtests 
into odd-even to increase the number 
of variables, thereby avoiding re- 
striction on the number of factors 
forthcoming. From this model study 
he concludes: 

The results are consistent with the efforts of 
some clinical psychologists to interpret the 
Wechsler ‘‘psychogram’’ as a_ personality 
measure provided attention is given to indi- 
vidual items of C and PC. Results are also 
consistent with prior factor studies of the 


Wechsler which have found only three to five 
[group] factors. 


Cohen (1957b) found factors on the 
WAIS similar to those obtained 
earlier on the WB. Besides a strong 
general intellectual factor he found a 
verbal comprehension, a perceptual 
organization, and a memory factor. 
Findings based upon four age groups 
lead him to conclude: 

This evidence is contrary to Garrett's ‘‘dif- 
ferentiation hypothesis,’’ which suggests a 


sharp reduction in the importance of the gen- 
eral factor by the late teens. 


that the 


He notes the exception 
memory factor tends to supplant 
much of the general factor in the old 


age group. He feels that the rather 
low amount of subtest specificity 
encountered helps account for disap- 
pointing outcomes with pattern anal- 
ysis. Zwart and Houwink (1958) 
also found three WAIS subtest fac- 
tors, two of which corresponded 
closely to Cohen’s factors. 

Saunders (1960b) reanalyzed his 
own WAIS data to study the factors 
involved in PC subtest items. Find- 
ings are interesting and important to 


the WAIS user since three distinct, 
clinically meaningful factors emerged. 
In another reanalysis, Saunders 
(1960a) found six factors were neces- 
sary to account for | and A responses. 
The complexity of I and the inap- 
propriateness of an over-all subtest 
score for pattern analysis is illustrated 
by the appearance of five factors in- 
volved in this single subtest. Three 
factors underlie the A subtest. 

Subtest rationale. Saunders (1959b) 
discusses the rationale of the Wechsler 
subtests in terms of clinically derived 
hypotheses that are consistent with 
early statistical findings. Cohen 
(1957a) similarly discusses WAIS 
subtest rationale in the light of his 
factor analytic findings. 

Levine (1958) concentrated on S 
and separated out the ‘‘not alike” 
responders. He found they had a 
lower mean IQ and he discusses the 
theoretical implications. In another 
study Levine, Glass, and Meltzoff 
(1957) separated out the ‘‘N”’ re- 
versers on DS and found they too 
were less intelligent and ‘‘cognitive 
inhibition time’’ (capacity to delay a 
response) was poorer than for con- 
trols. 

Matarazzo and Phillips (1955) 
were interested in the relationship 
between manifest anxiety score and 
DS performance. They believed a 
nonmonotonic function best ex- 
plained their data. When Goodstein 
and Farber (1957) examined the 
relationship between manifest anxi- 
ety and DS score, they included a 
very anxious group to extend the 
range of anxiety upward in the hope 
of clarifying the nature of the rela- 
tionship, but no significant relation- 
ship of any kind could be recognized. 

Heilbrun (1960) calculated the 
intercorrelations of four immediate 
memory tests including WB D for 
brain damaged and control patients. 
All intercorrelations for both groups 
were significant (ranging from .26 to 
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.62), suggesting a general memory 
factor but, nevertheless, of such re- 
stricted magnitudes as to dictate 
“considerable caution” in deriving 
conclusions regarding an individual’s 
general memory functioning from 
only one test. 

Summary. This section represents 
interest constructively directed at 
how the Wechsler works and what 
can be done to improve it; thus, it is 
disappointing to see that there are 
somewhat fewer articles covered than 
in the previous review. However, the 
quality of the articles is generally 
good. Cohen (1957b) continued to 
contribute methodologically by using 
age groups in factor analytic design. 
Saunders (1959a) has provided us 
with a first look at the specific and 
group factor structure of the Wechs- 
ler. His factor analyses of subtest 
items has been most productive and 
we look forward to further reports of 
these findings and the time when he 


will bring forth an up-to-date ra- 


tionale for all the subtests. Nichols’ 
(1959) manipulation of ego involve- 
ment and success experience provides 
important information and needs to 
be extended to other populations. 
THe WECHSLER As A DIAGNOsTIC AID 
Personality Variables 

Anxiety. In most studies the crite- 
rion measure for anxiety was the 
Taylor scale. Using a wide variety of 
subjects, such as psychiatric aides 
compared with outpatient state hos- 
pital patients; high and low anxiety 
groups of college undergraduates, or 
medical compared with psychiatric 
VA patients, Dana (1957b); Good- 
stein and Farber (1957); Mayzner, 
Sersen, and Tresselt (1955); and 
Matarazzo (1955) found no consistent 
relationship between the Taylor scale 
and Wechsler scores. Siegman (1956) 
found that Taylor scale anxiety was 
associated with lowered performance 
on timed subtests only. However, 


using a college population Calvin, 
Koons, Bingham, and Fink (1955) 
found a consistent relationship be- 
tween scores on the Taylor scale and 
diminished efficiency on such WB 
items as FSIQ, VIQ, V, I, D, A, BD, 
and OA. Not using the Taylor crite- 
rion, Griffiths (1958) assumed induc- 
tion of anxiety in a group of college 
freshmen exposed to an experience of 
failure in a testing situation. As 
compared to controls, significantly 
lower performance was observed on 
D and I but not on A, OA, or DS. 

Kerrick (1955) found that anxiety 
disrupted over-all performance of 
Air Force trainees on the WB, whereas 
in a similar study, Mayzner, Sersen, 
and Tresselt (1955) failed to observe 
such impairment with college stu- 
dents. Mayzner et al. hypothesized 
that the differences in the findings 
between the two studies might be 
attributable to the appreciable anxi- 
ety of Kerrick’s Air Force trainees, 
who realized the greater relevance of 
the test results to their future careers 
in service, as compared with the 
college subjects. 

Miscellaneous. Tallent (1958) was 
unable to support the clinical inter- 
pretation that ninth grade boys say- 
ing ‘‘yell fire” to the C “‘theatre”’ item 
are impulsive behaviorally as judged 
by their teachers. Of course, the 
negative results might equally well 
indicate that teachers have little 
recognition of their students’ impul- 
siveness. Of related interest is the 
finding that ‘‘ego delay function,’ as 
measured by Barron M-threshold ink- 
blots, time estimation, and Stroop 
Color-Word Test, was correlated 
with WBIQ and D (Spivack, Levine, 
& Sprigle, 1959). 

The WB has also been evaluated 
as a predictor of continuation in 
psychoanalytically oriented therapy 
(Hiler, 1958). Patients remaining in 
treatment for at least 20 sessions 
averaged about 10 points higher in 
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10 (mean IQ 112) and did better on S 
but poorer on D and DS relative to 
the other subtests than the patients 
discontinuing within five sessions. 
McReynolds and Weide (1960) re- 
ported dramatic changes on DS fol- 
lowing prefrontal lobotomies, but 
the subtest given preoperatively was 
not predictive of the degree of psychi- 
atric improvement postoperatively. 


Investigations of Diagnostic Value 


Several studies regarding the gen- 
eral diagnostic usefulness of the WB 
have appeared to reinforce our cau- 
tious, skeptical approach to the 
clinical application of tentative rela- 
tionships between test results and 
psychiatric condition. Frank (1956) 


correlated and factor analyzed the 
subtest scores of 60 subjects from 


nine diagnostic groups which, in a 
previous analysis, appeared homo- 
geneous in subtest scores. Only two 
unrotated factors were isolated: VIQ 
and PIQ. The conclusion was that 
“the WB does not yield significant 
data as regards psychiatric diagnosis, 
and continues to sort subjects in 
terms of intellectual factors only.”’ 
Cohen (1955) submitted WB profiles 
of 300 male veteran patients diag- 
nosed as_ psychoneurotic, schizo- 
phrenic, or brain damaged to seven 
experienced clinical psychologists and 
had them attempt to classify each 
case. Only one of the seven psycholo- 
gists correctly classified a significant 
number (132) of the 300 patients and 
only two others had above-chance 
success in the diagnosis of a single 
diagnostic group which in both cases 
was the brain damaged group. The 
judged classification correlated with 
the neuropsychiatric diagnosis is be- 
tween .13 and .22, which was deemed 
far too small to be of use clinically, 
It was concluded that there is some 
nonchance relationship between the 
WB pattern and the clinical diagnosis 
but that this relationship is detected 


by only a few clinicians and even then 
to only a degree having little practi- 
cal value. Despite these and earlier 
studies, some clinicians continue to 
use the test diagnostically with little 
hesitation. 

Almost at the other extreme, how- 
ever, are the clinicians who discount 
or disregard the possible influence of 
emotional or environmental 
upon IQ scores. For example, Gar- 
field and Affleck (1960) reviewed 24 
cases committed to an institution for 
the retarded but later judged not 
mentally defective and found the IQ 
played an important role in the 
commitment proceedings. In most 
of these cases serious emotional prob- 
lems, deprived environments, or un- 
but were 
neglected by the psychometrist who 
proceeded to write with finality a 
report diagnosing mental deficiency 
and indicating a poor prognosis. The 
gross misinterpretations and misuses 
of the IQ described in this article 
should concern over 
maintaining acceptable standards for 
practicing psychometrists. 

Rabin, King, and Ehrmann (1955) 
found long-term schizophrenics were 
lower than normals and short-term 
schizophrenics on the WB Vocabu- 
lary. Normals and short-term schizo- 
phrenics did not differ significantly. 
Characteristics of the stimulus word 
also affected the level of communica- 
tion; thus, it seemed that the possible 
effects of chronicity, severity of the 
pathology, type of verbal material, 
and scoring system should all be con- 
sidered in investigations involving 
verbal behavior of schizophrenics. A 
similar, detailed analysis of the WB 
Vocabulary performance of brain 
damaged patients by Heilbrun 
(1958b) revealed no significant differ- 
ences between such patients and 
physically ill patients either in terms 
of accuracy (standard scoring) or 
mode of response (categorical, de- 


factors 


cooperativeness existed 


arouse some 
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scriptive, equivalent, or functional). 
Thus, the concept of “‘latent aphasia” 
was not confirmed. Heilbrun (1958a) 
also assessed the discriminative effec- 
tiveness of D between brain damaged 
patients, psychotics, neurotics, physi- 
cally ill, ward attendants, and college 
students. He concluded that: 

despite the established sensitivity of the D 
test to cerebral pathology, the test still falls 
short of being a useful method of discrimi- 
nating between brain damaged and non-brain 
damaged. 


Measurement of Scatter 


Difference scores. Shortly after 
publication of the WAIS, Jones 
(1956) and McNemar (1957) cau- 
tioned that differences between sub- 
tests may not have diagnostic signifi- 
cance since the distribution of differ- 
ence scores for ‘‘normals’’ extends 
considerably beyond the point of 
statistical significance determined by 
the standard error of measurement, 
e.g., 30% of even the standardization 
population received a statistically 
reliable difference score between cer- 
tain subtests. The median reliability 
of these difference scores was reported 
by McNemar as being .60; hence, 
much of the difference score variance 
is attributable to errors of measure- 
ment. Fisher (1960), correcting 
Wolfensberger’s calculations (1958), 
presented a table for determining the 
significance of a difference between 
VIQ and PIQ on the WAIS and WB. 
Field (1960b), like Jones and Mc- 
Nemar, emphasized the distinction 
between the ‘‘abnormality”’ and the 
“reliability” of a WAIS difference 
score and presented useful tables 
indicating abnormality and statistical 
reliability of VIQ-PIQ differences 
and the reliability of subtest dis- 
crepancies singly or in combinations. 

The abnormality-rekiability dis- 
tinction is easily seen by noting that 
a VIQ-PIQ discrepancy of approxi- 


mately 25 points occurred once in 
every 100 subjects in the standardiza- 
tion population; thus, a greater dis- 
crepancy might be considered signifi- 
cant or ‘abnormal’ in a statistical 
sense (see tables by Fisher & Field). 
On the other hand, a VIQ-PIQ 
discrepancy of approximately 13 
points would occur only once in 100 
times by chance, i.e., because of errors 
of measurement associated with the 
1Q scores involved in the comparison. 
Consequently, a VIQ-PIQ discrep- 
ancy of 13 or greater is not likely to 
be spurious in the sense of a measure- 
ment error, but such “real” differ- 
ences are not unusual in the general 
population until they reach the mag- 
nitude of 25 IQ points or more. Ap- 
parently this distinction has not been 
thoroughly understood or has been 
disregarded. Even Wechsler (1958, 
p. 160) said that “in most instances a 
difference of 15 or more (IQ) points 
may be interpreted as diagnostically 
significant’”’ and at a later point that 
“a deviation of two or more scaled 
score units on any subtest from the 
[subtest] mean is a convenient cut-off 
point”’ in defining what constitutes an 
“abnormal deviation.’’ However, 
according to Field’s table involving 
the reliability of differences, a subtest 
must deviate by at least 5.75 weighted 
score points from the mean of the re- 
maining subtests in order to be signifi- 
cant at the .05 level. 

Griffith and Yamahiro (1958) re- 
ported the reliability or stability of 
subtest scatter in a heterogeneous 
group of 55 neuropsychiatric patients 
over an interval of 1-10 years (mean 
duration 42 months). The rank-order 
correlation between subtest scores 
averaged .51 with the higher rho’s 
being associated with test-retest com- 
parisons involving the same form and 
shorter intervals. They cautiously 
conclude that: 


whether the patterns of deviation do or do not 
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have personality or psychodiagnostic validity, 
the reliability is such that they might have. 
Subtest deviation scores from Vo- 
cabulary would seem to be a depend- 
able procedure for psychiatric pa- 
tients since Kasper (1958) found no 
significant relationship between rat- 
ings of “morbidity” (Lorr’s Multi- 
dimensional scale) and Vocabulary 
for psychiatric patients. 

Intellectual efficiency and potential. 
Since the inference of intellectual 
efficiency is sometimes made from a 
minimum of intratest scatter on the 
WAIS Vocabulary, Fink and Shontz 
(1958) analyzed 100 random protocols 
from Wechsler’s standardization and 
100 from physically ill patients in 
order to determine the frequency of 
0-, 1-, and 2-point responses for each 
Vocabulary item. Several deviations 
from the expected frequency for 
stimulus words were noted: e.g., 
WINTER, BREAKFAST, FABRIC, SLICE, 
ENORMOUS, SENTENCE, REGULATE, and 
REMORSE all yielded more one-point 
responses than expected for both 
groups. Brown and Bryan (1957) 
concerned themselves with an “‘alti- 
tude quotient’ (IQ based upon the 
two highest subtest scores) as an esti- 
mate of intellectual potential in 270 
young, “nonclinic’’? WB subjects. 
The mean difference between FSIQ 
and the altitude quotient was 24.6, 
with a standard deviation of 8.1; this 
difference tended to diminish with 
increased intellectual maturity (CA) 
and higher IQs. A correlation of .87 
was found between the IQ and the 
altitude quotient in this group. 

Mahrer and Bernstein (1958) ex- 
plored performance on_ repeated 
Wechsler Verbal subtest administra- 
tions. They urged subjects to give as 
many answers as possible and scored 
only the best. IQs continued to as- 
cend upon successive administration 
and they feel that this novel approach 
gives a good indication of intellectual 


potential. This method was compared 
by Thorp and Mahrer (1959) with 
four other more easily calculated 
estimates of potential intelligence: 
(a) intersubtest variability; (b) pro- 
rating the IQ from the highest subtest 
score; (c) prorating the IQ from Vo- 
cabulary; and (d) prorating the IQ 
from the three highest subtests 
weighted by 2.5, 1.5, and 1.0, respec- 
tively, from highest to lowest. For 60 
neuropsychiatric military patients, 
only Methods 6 and d involving the 
higher subtests yielded high correla- 
tions (.80 to .90) with the potential IQ 
estimated by the more laborious 
method. Yet, Mahrer and Bern- 
stein’s method yielded a higher esti- 
mate of potential intelligence “in 
almost every case’ than the corre- 


sponding estimate by the other meth- 
ods. These investigators also found a 
negative correlation (—.41) between 
the FSIQ and the increase in 1Q when 


potential was estimated which seemed 
largely attributable to IQs over 105, 
suggesting a ceiling effect. 

Scatter and diagnosis. By tallying 
the incorrect WAIS PC responses of 
110 normal females and 110 female 
psychiatric patients, Wolfson and 
Weltman (1960) determined the 
errors characteristic of female psychi- 
atric patients. As one might expect, 
psychotics were more likely to give a 
unique response than were neurotics 
or personality disorders, and 81% of 
the patients gave at least one unique 
response. Trehub and Scherer (1958) 
investigated the individual intersub- 
test variability within a sample of 
psychiatric patients composed of 166 
(61.7%) schizophrenics and 103 neu- 
rotics or character disorders. Their 
cutting score indicative of schizo- 
phrenia yielded 72.1% correct identi- 
fication, an improvement of 10.4% 
over the schizophrenic base rate. The 
proportion of misclassifications could 
have been further reduced by using 
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only the extremes of the distribution; 
however, this necessitates a corre- 
sponding reduction in the number of 
patients about whom _ diagnostic 
statements are made. 

An obvious limiting factor in the 
usefulness of any diagnostic sign is 
that it may differentiate selected 
diagnostic groups but not be uniquely 
associated with a single nosological 
category. For example, Ladd (1959) 
found that intersubtest variability 
was also greater in a brain damaged 
group than in a comparable neurotic 
group; Diller (1955) reported an in- 
flated ‘‘mean range ratio’ in de- 
linquents; and Plumeau, Machover, 
and Puzzo (1960) found a higher total 
scatter index for alcoholics. 
quently, other indices are needed to 
distinguish one pathological group 
from the other and such are the goals 
of pattern analysis, to be discussed in 
a subsequent section. 

Summary. A necessary distinction 


Conse- 


has been drawn between the reliability 
of a difference in IQ points or subtest 
weighted scores and the frequency of 


occurrence of such differences in 
specified populations. The cautions 
against confusing the two concepts 
should be heeded. Measures of inter- 
subtest scatter frequently distinguish 
groups of delinquents, schizophrenics, 
and organics from normals; however, 
the diagnostic value of this “sign’’ 
alone is negligible since it is clearly 
not unique to any one diagnostic 
group or sufficiently discriminative to 
be reliable in the individual case. 
There are several fairly reliable but 
not necessarily highly correlated 
methods of estimating intellectual 
efficiency or potential, but we must 
wait hopefully for evidence regarding 
the usefulness of such measures. 


Pattern Analysis 


The 
group 


Wechsler’s 
psychopaths 


performance of 
of adolescent 


(1944) was characterized by PIQ 
>VIQ, O0A+PA>BD+PC, and PA 
>all other subtests. Using a sample 
of sex offenders ranging from 14 to 64 
years old, Wiens, Matarazzo, and 
Gavor (1959) found the PIQ-VIQ 
relationship to be supported, while 
neither Foster’s (1959) adolescent 
recidivists, Field’s (1960a) English 
recidivists, or Panton’s (1960) pris- 
oners support it. Foster did find that 
OA+PA>BD-+PC but that PA ex- 
ceeded only BD and D. Graham and 
Kamano (1958) found a _ pattern 
similar to Wechsler’s psychopathic 
group in a sample of inmates of a 
federal institution only when they 
were also classified as unsuccessful 
readers; the “‘successful readers”’ did 
not yield the predicted pattern. 
Purcell (1956) found that in his sam- 
ple of Army trainee delinquents BD 
was least impaired, and that the most 
frequent offenders did poorest on C, 
V, and A. 

A thorough analysis of the WB 
performance of 87 male and 80 female 
juvenile delinquents matched for age, 
grade placement, and ‘global IQ was 
made by Diller (1955). The sexes 
were judged equally endowed with 
potential intelligence as indicated by 
prorating the three highest subtests, 
and both obtained a higher PIQ than 
VIQ. In terms of factors previously 
identified by Jastak, the delinquents 
were impaired in ‘verbal develop- 
ment” (V, I, C, S), ‘“‘motivation’”’ 
(A, D, DS), and mildly so in the 
“psychomotor area’”’ (BD, DS, I, PA). 
The sexes differed in that the males 
were superior in “reality contact” 
(C, PA, PC, OA), while the females 
had more ‘‘self control.’’ Two indi- 
vidual subtests showed sex differences 
—PC and DS—with males doing 
better on the former and poorer on 
the latter. 

With regard to subjects addicted to 
alcohol: some chronic alcoholics 





WECHSLER INTELLIGENCE SCALES FOR ADULTS 17 


showed evidence of pathology (clini- 
cally as well as test-wise) typical of 
the organic as in studies by Kaldegg 
(1956) and Tumarkin, Wilson, and 
Snyder (1955); while other alco- 
holics, even after 10-30 years of 
intense indulgence, were reported to 
show no apparent gross intellectual 
deterioration (Peters, 1956). Bauer 
and Johnson (1957) found no signifi- 
cant difference on subtest perform- 
ance between chronic alcoholics as 
compared with the general run of 
neurotics or ‘‘functional” psychotics. 
Plumeau et al. (1960) found that A 
was lower for ‘“unremitted’’ alco- 
holics than for either “remitted” 
alcoholics or controls. 


Effects of Organic Brain Damage 

Wechsler’s patterns. Wechsler’s 
subtest patterning for organicity was 
not cross-validated by Everett(1956), 
Fisher (1958), Ladd (1959), Love 
(1955), Reitan (1959). Wechsler’s 
observation that PIQ<VIQ_ was 
found by both Ladd and Love in their 
heterogeneous organic samples, in a 
group of organics with nonfrontal 
lobe lesions by Morrow and Mark 
(1955), in a group with right hemi- 
sphere damage by Klove (1959), in a 
group demonstrating poor ‘‘spatial 
integration” by Klove and Reitan 
(1958), and in a group of normal 
senescents of superior intellectual 
ability by Norman and Daley (1959). 
Eisdorfer, Busse, and Cohen (1959) 
found PIQ<VIQ for an aged group 
and Morrow and Mark observed this 
relationship in their organics grouped 
by foci. 

With regard to Wechsler’s ‘‘Hold- 
Don’t Hold” ratio: Reitan (1959) 
found some support for this pattern 
when using a pathological group as 
compared to Norman and Daley 
(1959) who did not when using nor- 
mal senescents. In this ratio it is as- 
sumed that C, I, PC, and OA will be 


resistive to the effect of factors con- 
tributing to intellectual deterioration. 
Reitan’s (1956) organics did not do 
well on C and I, as compared to the 
organics seen by Howell (1955); 
Inglis, Shapiro, and Post (1956); 
Klove and Reitan (1958); and Mor- 
row and Mark (1955). The organic 
samples of Klove, Klove and Reitan, 
and of Morrow and Mark, and Nor- 
man and Daley’s senescents did not 
do well on PC. None of the organics 
assessed by Ladd (1959), Morrow 
and Mark, or Norman and Daley’s 
senescents did well on OA, although 
Klove’s organics did. 

In Wechsler’s ratio it is also as- 
sumed that D, A, BD, and DS will be 
most affected by factors contributing 
to intellectual deterioration. In gen- 
eral, this was supported by the find- 
ings of Klove and Reitan (1958), 
Ladd (1959), Love (1955), Norman 
and Daley (1959), and Reitan (1956). 
However, neither Heilbrun (1958a), 
Reitan (1959), or Ladd found that D 
was significantly lower for their or- 
ganics, whereas Klove and Reitan, 
Morrow and Mark (1955), and Tolor 
(1956, 1958) did. Klove (1959) found 
that low D and A were characteristic 
of his sample of patients with left 
hemisphere damage only. The find- 
ings of Heilbrun (1959), Howell 
(1955), Klove, and Parker (1957) all 
attest to the significantly poor per- 
formance of organics on BD, and 
Thaler (1956) found that decrements 
in BD were directly related to aging. 
This is contrary to the performance 
of the organics seen by Fisher (1958), 
and Inglis et al. (1956), or the senes- 
cents seen by Norman and Daley. 
Neither Fisher nor Howell found that 
their samples of organics demon- 
strated any unique difficulty on DS; 
however, the groups seen by Klove, 
Klove and Reitan, and Morrow and 
Mark did. Moreover, the data of 
Loranger and Misiak (1959), Nor- 
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man and Daley, and Thaler demon- 
strate that DS performance, as with 
BD, declined with age. Yet Hall 
(1956) observed that the organic 
pattern, DS+BD<I+V, frequently 
occurred in nonorganic patients. 
Hewson ratio. Everett (1956) 
found no significant relationship be- 
tween the presence of organicity and 
the Hewson ratio while McKeever 
and Gerstein (1958) found that the 
Hewson ratio classified. 75% of a 
group of schizophrenics as organics. 
Bryan and Brown (1957) found that 
the Hewson ratio identified 27% of a 
nonorganic group as organic, that 
38% of a group of adolescents sus- 
pected of having CNS involvement 
on the basis of clinical data were 


identified as organic, but that 67% 
of patients with known organic in- 
volvement of a ‘‘mild’”’ degree, and 
96% of patients with a ‘‘moderate”’ 
to ‘‘marked”’ degree of organic im- 
pairment were correctly identified as 


organic. 

Effects of specific organic involve- 
ments. Bressler (1956) found that 
PIQ © significantly differentiated 
aphasics from normals, but not or- 
ganics with aphasia from those with- 
out. Fisher (1958) found that 
paretics demonstrated selective im- 
pairment on subtests and that paresis 
affects verbal abilities to as great an 
extent as performance. Klove and 
Reitan (1958), Milner (1958), and 
Reitan (1955) found that patients 
with left hemisphere lesions do poorer 
on verbal tests as compared to those 
with right hemisphere lesions, the 
latter doing poorer on performance 
tests. Heilbrun (1956) also found 
lower verbal scores for left hemi- 
sphere lesions but failed to find that 
their performance scores were better 
than for the right hemisphere group. 
Bortner and Birch (1960) found left 
hemiplegics had more difficulty with 
BD than right hemiplegics but the V 


was small and the task involved only 
recognition. 

Thaler (1956) found that patients 
with normal and focal EEG tracings 
perform better on such tests as V, I, 
BD, and DS as compared to those 
with mixed or diffuse tracings. How- 
ever, Morrow and Mark’s data (1955) 
suggest (a) no significant difference 
in the performance of patients with 
either focal or diffuse cortical lesions; 
(b) patients with frontal lobe lesions 
showed only slight intellectual im- 
pairment, save on DS, while patients 
with lesions dorsal to the Rolandic 
fissure demonstrated a tendency to- 
ward greater intellectual impairment; 
and (c) patients with left hemisphere 
damage demonstrated a tendency to 
loss in VIQ and PIQ, whereas pa- 
tients with bilateral lesions showed 
loss in PIQ only. 

Summary. Research findings in 
this section are at best inconsistent 
and, hence, inconclusive. One study 
demonstrated a superiority of predic- 
tions based on behavioral data as 
compared to a few a priori test pat- 
terns (Gaston, 1959) and another, 
the difficulty of even some seasoned 
clinicians to sort test profiles into 
gross categories of neurosis, schizo- 
phrenia, and organicity (Cohen, 
1955); and Frank (1956) found the 
same inability to sort patterns even 
when the “‘sorter”’ is factor analysis. 
Yet in spite of the continued equivo- 
cality of the findings, faith persists in 
the assumption that a test of cogni- 
tive functions should be able to reveal 
more about a person than just his IQ. 
This faith may not be completely 
unjustified. 

One might ask whether the sup- 
portive evidence might not be chance 
phenomena, whether the persistent 
inconsistency of the findings from 
review to review does not strongly 
suggest the fruitlessness of attempt- 
ing to make assessment of Wechsler 
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patterns. Yet the frequent occur- 
rence of positive studies may be re- 
garded as evidence that analysis of 
patterns can be meaningful and that 
something other than the tool itself 
might account for the failure of the 
research to provide consistent and 
definitive answers. 

One of the methodological short- 
comings is the failure to distinguish 
between a mean diagnostic group 
profile and modal patterns of homo- 
geneous subjects in a diagnostic 
group. While there is only one mean 
group profile for a sample, several 
groups of the subjects may form 
clusters of homogeneous symptoms 
with rather dissimilar modal patterns. 
Furthermore, the group profile can- 
not be expected to conform to any of 
the modal patterns since it is a 


statistic and no single subject should 
be expected to correspond to the 
mean group profile. Only modal pat- 


terns are appropriate for diagnostic 
purposes. Wechsler (1944) fails to 
identify the nature of his proposed 
diagnostic patterns. Since only one is 
given for each diagnostic group it 
seems likely that he has proposed the 
relatively useless group profile; at 
least, this is presumed by. most in- 
vestigators in checking the validity 
of his proposals. Only a clear under- 
standing of these simple principles 
can lead to a respectable research ap- 
proach to diagnostic pattern analysis. 

An analysis of the investigations 
beyond the results per se suggests 
that much is still to be desired with 
regard to the designs of the research. 
For instance, from a purely methodo- 
logical point of view, one might 
wonder whether or not clinical facts 
are being sacrificed for statistical 
significance. In light of the many 
variables other than intelligence and 
psychopathology that tend to affect 
subtest performance and greatly ex- 
pand error variance, the arbitrary 


limits of the .01 or .05 level of confi- 
dence might be too high. Yet a pat- 
tern that fails to discriminate be- 
tween groups at these levels of 
confidence would seem too weak to 
use Clinically with individuals. 

One might also be disappointed 
at the seeming lack of flexibility 
and/or creativity regarding the form 
of these experiments. The majority 
of the studies employed the matched 
group design using a statistically 
simple test of an inference (chi square 
or t). Zero-order statistics are seldom 
suited to the complex analysis or 
identification of multidimensional 
patterns. Of the many studies sur- 
veyed in this section only two went 
beyond the single or simple multiple 
correlational techniques ‘into factor 
analysis, only six went beyond a ¢ or 
the utilization of F as a multiple ¢, 
and only three studies made use of an 
analysis of variance design to test 
interaction effects. 

One might also show concern re- 
garding the samples of subjects upon 
which the conclusions are based. 
Samples of organics employed have 
been observed to contain such dis- 
parate kinds of pathology as tumors, 
vascular pathologies, infectious dis- 
eases, various kinds of head trauma, 
epilepsy, and developmental anom- 
olies. Included in a single sampling 
have been patients with lesions which 
have been both focal and diffuse, have 
involved different lobes, have been 
uni- and bilateral, or have been both 
cortical as well as subcortical in na- 
ture. Similarily, in the research on 
the ‘‘character disorder,’ the sorts of 
behavior included in such a grouping 
might vary from such offenses as 
delinquency, to burglary and dope 
peddling, to assault, rape, and arson. 

One might note that McKeever 
and Gerstein (1958) found that meas- 
ures of organic deterioration varied 
systematically with age, and Fry 
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(1956) found that the process of 
deterioration was not the same in 
people with limited intellectual ca- 
pacity as compared to others. Group- 
ing subjects by the criteria involved 
in the classification of ‘character 
disorder,”” or some synonymous 
phrase, proves to be no more valid. 
Lazzari, Ferracuti, and Rizzo (1958) 
found significant differences in the 
mean 1Q of samples of delinquents 
just on the basis of crime committed, 
i.e., fraud vs. rape. Wiens, Mata- 
razzo, and Gavor (1959) found that 
upon more extensive and intensive 
study, patients initially diagnosed as 
“character disorder’ turned out to be 
sociopathic personalities, inadequate 
personality types, mental defectives, 
adjustment reactions in adolescence, 
adult situational reactions, depressive 
reactions, neurotics, schizoid per- 
sonalities, and even schizophrenics 
and organics. Therefore, there is 
reason to assume that such heteroge- 


neous groupings introduce a variety 
of systematic effects which would 


detract from the identification of 
consistent and meaningful patterns 
associated with the disorder. 

In the experiments reviewed herein, 
investigators have attempted to off- 
set the effect of certain variables by 
the method of randomization. Yet 
there is some doubt (Cohen, 1955) 
that this is an entirely effective 
procedure in equalizing the influence 
of such factors as age, 1Q, education, 
etc. It is still not certain whether 
some of the confusion in the findings 
might not be attributed to the inade- 
quacy of such a procedure. No in- 
vestigators actually sought to deter- 
mine whether the range of age, edu- 
cation, and IQ within the samples 
made for a significant lack of homo- 
geneity of the groups. 

It would appear that systematic 
research is still necessary to satisfac- 
torily establish diagnostic patterns. 


We wonder if the present interest in 
pattern analysis of organic brain 
diseased patients will persist or will 
it, like the former search for schizo- 
phrenic signs, being unrewarded, 
evaporate. We hope that Reitan’s 
current use of carefully specified types 
of organic patients for investigation 
will yield significant patterns and 
point the way for similar investiga- 
tion of homogeneous groups of schizo- 
phrenics. 


GENERAL SUMMARY 


The WAIS is a much improved 
instrument when compared with its 
predecessors. It measures pretty 
much the same thing that a number 
of other standardized methods at- 
tempt to do. However, comparative 
studies of the instrument suffer from 
methodological shortcomings and rely 
excessively on correlational tech- 
niques and insufficiently on compari- 
sons of mean scores. 

The test has quickly become do- 
mesticated in the various research 
and clinical settings and has produced 
some interesting findings reflecting 
age differences, sex differences, and 
relationships with an array of differ- 
ent educational, vocational, socio- 
economic, and environmental factors. 
There is, perhaps, a need to attempt 
to set up such studies in a broader 
and deeper theoretical framework 
rather than to continue isolated 
forays in the flatlands of pure em- 
piricism. Wechsler (1958) has “‘be- 
come increasingly convinced that 
intelligence is most usefully inter- 
preted as an aspect of the total per- 
sonality ...an effect rather than a 
cause.” 

Actually the studies on anxiety, 
impulsiveness, distrust, etc. included 
in this review are beginnings in the 
right direction. Inferring other per- 
sonality variables from intellectual 
functioning is really an important 
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avenue to diagnosis. When the con- 
cept of diagnosis is thus more broadly 
conceived, as personality assessment, 
we need not concur with . Meehl 
(1960) in his pessimistic prognostica- 
tion. 


The additional work on ‘“‘scatter,”’ 


profiles, and patterns has not led us 
on more solid diagnostic ground. The 
results with the several nosological 


Severe 
methodological shortcomings of the 
investigations prevent the isolation 
of modal profiles useful for diagnosis. 
It is perhaps time to face the chal- 
lenge embodied in Binder’s (1956) 
study of schizophrenia. Is there a 
differential intellectual impairment? 
Binder answered the question in the 
negative by demonstrating an over-all 


categories are inconclusive. 


reduction in schizophrenic function- 
ing when assessed with an instrument 
(SRA which measures rela- 
tively independent abilities. Rela- 
tively independent factors of mental 
ability, isolated from the WAIS, 
might serve as a sounder basis for 
future 
logical groupings. 

Finally, we 


tests) 


diagnostic studies of noso- 
must mention 
the inadequacy (heterogeneity) of the 
criterion—schizophrenia, character 
disorders, etc. We discussed the issue 
in detail elsewhere (Rabin & King, 
1958) and have recommended ‘‘The 
selection of a specific frame of refer- 
ence in the 
ples... chronicity, or 


again 


determination of sam- 
reactive vs. 
process’ as an avenue and approach 
to more fruitful research. 
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AND 


The history of science offers many 
examples of potentially useful theo- 
ries that did not realize their promise 
until appropriate methods had been 
devised. Once methodology becomes 
available, a flood of research often 
follows, which in turn tries the theory 
and results in new formulations that 
in their turn wait on experimental de- 
vices. Such progress may falter either 
because of an absence of theoretical 
speculation or the lack of methods, 
and it is futile to assign prior im- 
portance to one or the other. 


Such interdependence becomes 


clear when we compare the influence 
that Binet’s scales had on the theories 
of intellective behavior in children to 
the relative dearth of systematic re- 
search on early personality develop- 


ment, although there is certainly no 
lack of theories concerning the latter 
problem. However, standardized 
methods for appraising personality 
variables in preliterate children are in 
short supply. 

Reasons for the dearth of methods 
are not hard to find. Research with 
preschool children presents certain 
special problems. The instructions 
and operations must be simple enough 
for young children to understand. 
The subjects must have the physical 
abilities to perform whatever acts 
are demanded by the method. Per- 
haps most important, the tasks must 
entice and maintain interest against 
a brief span of attention. Further, 


1 We are indebted to the participants in the 
workshop on doll play methods at the 1957 
American Psychological Association meetings 
for many suggestions which appear in this 
paper. We especially thank Mary Ford, Clara 
Melville, Judy Rosenblith, and Richard 
Walters for their comments on the manu- 
script. 
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for personality research minimal de- 
termination of the child’s behavior as 
artifacts of measuring devices is de- 
sirable. 

Few extant methods meet these 
criteria. Bellaks’ CAT (1950) and the 
Rorschach (e.g., Ames, Learned, 
Metraux, & Walker, 1952) are widely 
and profitably used but present prob- 
lems. They rely solely on children’s 
language responses and are, in the 
writers’ opinions, too dependent on 
passive, nonacting out, behavior. 
Likewise, interviews with children, 
because they depend on the child’s 
understanding of language may intro- 
duce many idiosyncratic factors. 

Doll play has offered the promise of 
range and flexibility in personality 
research. It is the purpose of this 
paper to summarize the research uses 
of doll play and to assay the results of 
the promise which this method offers. 

Doll play started as a clinical de- 
vice. Anna Freud (1928) attributes 
its first to Melanie Klein who 
employed it as a procedure both for 
the diagnosis and treatment of dis- 
turbed children. However, the con- 
cern of this survey is with the research 
rather than the clinical uses of doll 
play. we shall 
mean that variables have been meas- 
ured by this method and related to 
other variables. The studies reviewed 
cover the period 1933 to 1960. 

What is doll play? There are nu- 
merous variations, but essentially the 
young child is presented with a set of 
dolls—such asa family—and a setting 
in which dolls are to operate-—such as 
a home—and told to manipulate the 
dolls while he tells a story about them. 
The child has an opportunity here to 
talk as well as to act. Endless changes 
can and have been rung on this basic 
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theme: the composition of the dolls, 
the nature of the setting, the amount 
and kinds of interaction with the re- 
searcher, the directions and structure 
presented to the child, etc. A host of 
variables has been scored from the 
children’s protocols. Chief among 
them are aggression, stereotypy, and 
prejudice. The method appears to be 
more useful for some of these vari- 
ables than for others. 

Although doll play was used in re- 
search before their work, a strong 
impetus to doll play as a method in 
the study of personality development 
was the work of R. R. and Pauline S. 
Sears at the Iowa Child Welfare Re- 
search Station in the mid-1940’s. 
The original studies, after Bach 
(1945) indicated the potential value 
of the method, were methodological 
and will be discussed below. The 
serendipitous occurrence of marked 
sex differences in doll play led to more 
recent work by these investigators 
and their co-workers on identification 
in children, providing one case where 
a characteristic of a method led to 
new theory, rather than the converse 
textbook ideal. 


Because of the theoretical disposi- 
tions of the early investigators, the 


most frequent variables measured 
were derived from behavior theory 
and were indices of acquired drives in 
children. Hence, than any 
other behavior, fantasy aggression 
has been measured by this technique, 
and it was the happy confluence of 
theory and method that this particu- 
lar behavior is frequently elicited in 
doll play. 


more 


A host of differences exist among 
the subjects, equipment, and proce- 
dures in the studies which will be re- 
viewed. The following sections are 
organized to indicate the modal find- 
ings or procedure and to sketch the 
range of variations from the typical 
occurrence. 


METHOD 
Subjects 

The usual subjects in doll play re- 
search have been preschool children. 
However, children between the ages 
of 5 and 10 have been used in many 
investigations, and in three studies 
the subjects were up to 13 years old 
(Honzik, 1951; Levy, 1933; Witkin, 
Lewis, Hertzman, Machover, Meiss- 
ner, & Wapner, 1954). Subjects at 
the extremes of the age range have 
usually required procedural adapta- 
tions. Heinicke (1956), in his study 
of 2-year-olds, found that children of 
this age did not use dolls as agents of 
actions. He felt that he gathered 
meaningful data about his subjects 
by putting them in a doll play situa- 
tion, but in view of the types of 
variables which yielded results—rate 
of play, calling for parents, seeking 
the observer’s affection, hostility to 
dolls and other play objects—it would 
seem that the findings were incidental 
to the doll play method. At the upper 
age levels subjects usually have been 
instructed to regard the dolls as char- 
acters in a play or movie. Only one 
study has used adult subjects— 
Rosenzweig and Shakow (1937) com- 
pared the constructions of play ma- 
terials by adult psychotics to those of 
normal adults, and concluded that 
their subjects responded favorably to 
the technique. 

Most research with doll play has 
employed white subjects. Occasion- 
ally, in studies of racial identification 
and prejudice have been 
used (Goodman, 1952; Graham, 1955; 
Radke & Trager, 1950; Stevenson & 
Stewart, 1958). The method has also 
been used successfully with American 
Indian groups (Gewirtz, 1950) and 
with children in a primitive society 
(Henry & Henry, 1944). 

Both boys and girls have served as 
subjects. The only indication that 
there might be sex differences in will- 


Negroes 
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ingness to play with dolls comes from 
Finch (1954), who reported that 
among her subjects, aged 3-8, those 
from all-boy families refused to par- 
ticipate. However, since her proce- 


dure involved doll play in the home 
as well as in the laboratory, the 
findings may not be typical. 


Equipment 


There is no standard material for 
the construction of dolls—they may 
be made of plastic, wood, clay, 
celluloid, rubber, stuffed fabric, or 
pipe cleaners and cardboard. Their 
clothing may be nonexistent, simple 
or elaborate, removable or permanent. 
However, in the majority of studies 
reported, the dolls were 1.5”—6” tall, 
realistically dressed, and were flexible 
so that they could be bent to standing 
or sitting positions. The “standard 
doll play family,” to the extent that it 
exists, consists of father, mother, boy, 
girl, and baby. This number can 
either be reduced or expanded to 
study particular interactions—e.g., 
restricted to mother and child (Isch, 
1952); to mother, baby, and older 
brother or sister (Levy, 1933); or ex- 
panded to include maid (Bryan, 1940); 
teacher (Bach, 1945; Melville, 1959); 
grandparents (Halnan, 1950; Johnson 
1952); or additional siblings (Bryan, 
1940). Sometimes the subject is given 
dolls which duplicate his own family 
(Bremer, 1947; Halnan, 1950; Hol- 
way, 1949; Johnson, 1952; Radke, 
1946; Ryder, 1954), or he is presented 
with a large number of dolls of differ- 
ent age-sex categories, and given his 
choice (Goodman, 1952; Henry & 
Henry, 1944; Korner, 1949). 

The dolls are typically presented 
in, or in front of, some indoor setting. 
Most common is the use of a five- or 
six-room which has fixed 
wooden or cardboard walls, but no 
roof. The house is usually filled with 
realistic, movable doll furniture which 


house 


has few manipulatable parts. Some- 
times no house is used—instead, the 
child is given furniture which is 
either organized into ‘‘rooms’’ or 
lined up in rows (Bryan, 1940; Finch, 
1954; Goodman, 1952; Halnan, 1950; 
Holway, 1949; Johnson, 1952; Korner 
1949; Phillips, 1945; Pintler, 1945; 
Radke, 1946; Robinson, 1946; Ryder, 
1954). Occasionally blocks are avail- 
able, making it possible for the child 
to construct walls if he desires them 
(Bryan, 1940; Pintler, 1945). Settings 
other than have 
ployed in rare instances—e.g., a com- 
plete neighborhood (Meister, 1948), 
a school room (Bach, 1945; Melville, 
1959), or a scale model of a backyard 
filled with play equipment (Bremer, 


1947). 


houses been em- 


Procedure 

Typically, the subject is brought 
into an experimental room, shown the 
dolls and other equipment, and told 
that he may play with them in any 
way he wishes. Sometimes it is sug- 
gested that he make up a story (Bach, 
1945, 1946; Bach & 1947; 
Hollenberg, 1949; Johnson, 1951; 
Krall, 1953; Levin, 1955) but even in 
these cases the direction of the fan- 
tasy is left completely to the child. 

The interaction between experi- 
menter and subject is usually con- 
trolled to extent—the experi- 
menter may avoid interaction when- 
ever possible (Bryan, 1940; Honzik, 
1951); he may limit the frequency of 
interaction, usually according to the 
levels established by Pintler (1945), 
which will be discussed later; or he 
may control the situation only in the 
sense of adopting a constant attitude 
of noninterfering permissiveness and 
attentiveness (Bach, 1946; Bach & 
Bremer, 1947; Holway, 1949; Levin, 
1955; Ryder, 1954). 

In studies whose primary aim is to 
compare the results of free doll play 


Bremer, 


some 
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with results of other measures of 
personality (Ryder, 1954; Simpkins, 
1948; Witkin et al., 1954) one session 
of play may be all that is used. Most 
experiments, however, provide for 
the analysis of session-to-session 
changes, usually with two 20-minute 
sessions a few days apart. Some 
studies have used more than two ses- 
sions (Bremer, 1947; Heinicke, 1956; 
Hollenberg & Sperry, 1951; Isch, 
1952; Johnson, 1951; Phillips, 1945; 
Pintler, 1945). 

In most cases, the session length is 
determined beforehand, but even 
though measures are taken only dur- 
ing the standard time, the subject 
may be allowed to continue playing 
as long as he wants (Bremer, 1947). 
Some workers have not limited the 
session time, but have recorded all 
responses until the child lost interest 
(Goodman, 1946; Ryder, 1954). The 
latter procedure, of course, makes 
imperative the use of response propor- 
tions instead of response frequencies 
as measures. 

In studying specific variables which 
would be unlikely to occur with suff- 
cient frequency to give useful results 
under the free play procedure out- 
lined above, investigators have used 
a more directive approach in which 
the setting of the story is specified. 
The measurements may be mainly in 
terms of the dolls’ actions, as when 
Levy (1933) records what the subject 
has the older doll do when it sees its 
baby sibling at the mother’s breast, 
or doll play may frankly be used as 
an aid to make it easier to talk to 
children. In the doll play interview 
used by Ammons and Ammons 
(1952), the movement of the dolls is 
often only an adjunct to enable chil- 
dren to express feelings when they 
are having difficulty in verbal expres- 
sion. The same seems to be true of 
Conn’s (1938) study of carsickness, 
and of Levy’s (1940) and Conn’s 


(1940) studies of reactions to the dis- 
covery of genital differences. 

Studies of prejudice (Goodman, 
1946; Radke & Trager, 1950) have 
confronted subjects with direct 
choices between white and Negro dolls 
to reveal their concepts of the status 
of the racial groups and their prefer- 
ences for them. In addition, Good- 
man used a story completion tech- 
nique, in which the subject decided 
which doll won in cases of conflict. 

The story completion technique is 
not necessarily restricted to the study 
of one variable. Since the completion 
of a prestructured story takes only a 
short time, a variety of situations can 
be presented, offering the advantage 
of overall scores as well as specific 
ones. Stamp (1954) and Walsh (1956) 
had their subjects complete a number 
of stories, including one free story 
which the child made up himself. 
Several other studies have used a 
combination of free play and story 
completion (Halnan, 1950; Johnson, 
1952; Winstel, 1951; Wurtz, 1957). 

D. B. Lynn (1955) has developed a 
Structured Doll Play Test (SDPT) 
which presents the child with 10 situa- 
tions in a given order, each with a 
prescribed arrangement of dolls and 
furniture. The child completes the 
story, which in some situations in- 
volves a clear-cut choice—e.g., be- 
tween bottle and cup, crib and bed, 
mother and father—thus facilitating 
objective scoring. The SDPT has 
already been used in investigating 
age and sex differences (R. Lynn, 
1955) and the effects of father ab- 
sence in Norwegian sailor families 
(Lynn & Sawrey, 1959), and an exten- 
sive program of research using the 
test is planned (Lynn & Lynn, 1959). 

Certainly the effort to get a more 
standardized procedure to insure 
comparability among studies is worth- 
while. At present, the great variety of 
materials and procedures which have 
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been employed make such compari- 
sons of unknown significance. One 
way to overcome this difficulty is to 
follow the line suggested above—i.e., 
to develop a standard procedure and 
use it throughout an extended re- 
search program. However, it is 
readily apparent that no one method 
can fit the needs of every research. 
For example, Wurtz (1957), after 
trying to use responses to incomplete 
stories as an index of guilt, concluded 
that the technique was too highly 
structured for his purposes. It seems 
that in addition to standardization 
an attempt should be made to clarify 
the effects of variations in equipment 
and procedure in order to help explain 
results already obtained, and to offer 
the prospective research worker infor- 
mation that will allow him to select 
the best conditions for his purposes. 

To a large extent, the worker cur- 
rently faced with a choice of doll play 
procedures is offered very little ad- 
vice. The best way to learn how to do 
a good doll play study still seems to 
be to collect the ‘‘lore’’ from someone 
with experience in the area. Some of 
this information is given as hints in 
research reports, but it is scattered 
and because of its dependence on the 
specific conditions may not be gen- 
erally useful. For example, Ammons 
(1950) found that there were signifi- 
cantly fewer refusals to respond in a 
doll play interview when simple 
aliernatives were given, when the 
subject was asked what the doll 
would do rather than what it would 
say, when the items were affect- 
loaded, and when the subject was 
asked to verbalize the feelings of 
child, rather than adult, dolls. These 
kinds of ‘‘hints’’ will be useful to any- 
one planning to use a doll play inter- 
view with a sample similar to Am- 
mons’ (boys, aged 2-6), but we 
cannot say whether they have ap- 
plication to free doll play or to other 


age-sex groups. Similarly, it would 
be valuable to be able to predict how 
much the child will identify the dolls 
with his own family members, but no 
attempt has been made to find ways 
of influencing this variable. Bach 
(1945) reports that among his nursery 
school subjects, any insistence by the 
experimenter that the child identify 
with the dolls led to resistance by the 
subject. Within the same approxi- 
mate age range, Despert (1940) found 
14 out of 15 subjects who made at 
least some specific identification of 
the dolls with their own families, 
while Finch (1955) reports little suc- 
cess in getting children to act out 
parental roles in relation to dolls in 
the laboratory. 

The major attempts to evaluate 
the effects of equipment and proce- 
dure have been made in research un- 
der the influence of R. R. Sears. 
Phillips (1945) found that the only 
effects of giving the subject highly 
realistic dolls and furniture rather 
than having him play with unclothed 
dolls and ‘“‘furniture’’ of simple 
wooden blocks increased ex- 
ploratory behavior and time 
spent in organizing the materials. 
Pintler’s (1945) study of the effect of 
organization of the equipment dis- 
closed that when the furniture and 
walls of the house were arranged in 
irregular rows instead of being or- 
ganized into rooms, children spent 
more time in organizational behavior. 
Giving the subject a doll family that 
duplicates his own has been shown to 
produce more identification with the 
dolls than does the use of a standard 
family (Robinson, 1946). Ina study 
comparing yard and house settings, 
Bremer (1947) found that the use of a 
house led to more inappropriate or- 
ganizational behavior, whereas hav- 
ing the dolls placed in a yard setting 
with picnic, garage, sandbox, slide, 
and swing produced more nonstereo- 


were 


less 
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typed thematic fantasies, more theme 
changes, and more total aggression. 

In the investigation of the effective- 
ness of the experimenter in maintain- 
ing rapport and stimulating the child 
to elaborate themes in play within the 
experimental situation (Pintler, 1945) 
it was found that high interaction 
between experimenter and child (be- 
tween 15 and 20 of such stimulating 
interacts in 5 minutes of play) pro- 
duced more nonstereotyped fantasies, 
more theme changes, more aggression, 
and an earlier onset of aggression 
play than did a low interaction level 
(less than 5 interacts in 5 minutes). 
Studying the effect of the length and 
number of experimental 
Phillips (1945) found no differences 
between the fantasy material pro- 
duced in three 20-minute sessions and 
that in a single hour-long session. 

The sex of the experimenter also 
seems to affect results. Subjects show 
more aggression in the presence of an 
experimenter of the same sex (Caron 
& Gewirtz, 1951). 

The above summary represents 
our total substantive information 
about the effects of procedural varia- 
tions. Even our information about 
those variables that have been in- 
vestigated is limited, since most of 
them have been studied in isolation 
or in combination with only one other 
manipulated variable. Each is tied to 
the particular age group on which it 
was used, and has been tested only 
with respect to a limited number of 
dependent variables. For example, 
perhaps the most widely used refer- 
ence of those in the above discussion 
is the work of Pintler (1945). Several 
studies (Bremer, 1947; Jeffre, 1946; 
Krall, 1953; Phillips, 1945; Robinson, 
1946; Scott, 1954; Sears, 1951; Sears, 
Pintler, & Sears, 1946; Yarrow, 1948) 
used ‘‘Pintler’s high interaction level”’ 
or ‘‘Pintler’s low interaction level” as 
standards which have been demon- 


sessions, 


strated to have certain effects. There 
is no doubt that this work is a valu- 
able contribution; however, it ap- 
pears that there is much knowledge 
still to be gained on interaction level 
before complete understanding of its 
role is reached. Pintler’s study used 
only preschool children, and there are 
indications that the levels she used 
may be successful with older 
children. For example, Simpkins 
(1948), using 5-9 year old subjects, 
tried to use Pintler’s high interaction 
level, but decided that the fantasy 
material was being directed by the 
experimenter too much, so she em- 
ployed a more nondirective attitude. 
E. Z. Johnson (1951) found that 
neither of Pintler’s levels was satis- 
factory for third graders, and ended 
up using an intermediate level. 

The effects of many potentially 
influential variables have never been 
studied. Thus far, all the studies that 
have varied the behavior of the ex- 
perimenter have revealed differences 
as a consequence. Since it appears 
that the experimenter is necessary to 
encourage verbalization of the sub- 
jects, his role becomes crucial. Is 
there a way to standardize “the doll 
play experimenter’? Or is it more 
profitable to partial out his influence 
on the results? It seems to us that the 
answer to these questions awaits the 
demonstration of significant differ- 
ences attributable to the 
“warmth” of the adult, a dimension 
suggested by workers in the field to 
be of importance. Only after such 
characteristics have been identified 
objectively can a decision be made as 
to how best control them. 


less 


€.g., 


RELIABILITY AND VALIDITY 


Consistency of Behavior ‘ 


In this section it is not our inten- 
tion to discuss scoring reliability, 
since this form of reliability, as in all 
observational procedures, depends on 
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the explicitness of the definitions of 
the observation categories and on the 
adequacy of observers’ or coders’ 
training. Nevertheless, it surprised 
the reviewers how often researchers 
neglected the common research pro- 
tocol of reporting observer reliability. 
In many cases, this simple omission 
makes the appraisal of results diffi- 
cult. 

Two kinds of information 
about the consistency of a subject's 
behavior in doll play: the comparison 
of scores across two or more sessions 
(analogous to test-retest reliability) 
and comparisons between early and 
later portions of a single session (as in 
split-half reliability). As with so 
much of doll play data, most informa- 
tion exists for the aggression variable. 
The session-to-session correlations in 
either amount or percent aggression 
varies from .50 to .85 with a median 
intersession correlation of about .65 
(Ammons & Ammons, 1949; Gewirtz, 
1950; Levin & Sears, 1956; Sears, 
1951; Stamp, 1954; Yarrow, 1948). 
These correlations when interpreted 
against a background of varying 
observer reliabilities and session-to- 
session changes in the incidence of 
aggression (see below) indicate quite 
acceptable reliability. It should be 
kept in mind that test-retest relia- 
bilities of more highly standardized 
tests of intelligence are within the 
same range for children of this age. 

Ammons and Ammons (1949) in a 
structured doll play situation, report 
corrected split-half reliabilities for 
aggression of .77 for the first session 
and .75 for the second. 

For other than doll play aggression, 
Bryan (1940) reports a more holistic 
appraisal of behavioral consistency 
in doll play, wherein a graduate 
student matched protocols for two 
sessions at better than chance level. 
The intersession period in Radke’s 


study (1946) was 4-5 weeks. The 


exist - 


consistency in specific categories 
ranged from 29% for dominant 
themes to 67% for such variables as 
attitude toward the mother. 

There are, unfortunately, too few 
reports of the consistency of behaviors 
other than aggression to appraise the 
reliability of other variables. 

Validity 

Doll play shares with expressive- 
projective techniques certain serious 
problems in the determination of 
validity. Take aggression as an ex- 
ample. Since aspects of this behavior 
are disapproved in real life and since 
doll play presumably reduces these 
social restraints it may be expected 
that children high in the inhibition of 
real may be especially 
aggressive under make believe cir- 
cumstances. Were the result a sub- 
stantial negative correlation between 
real life and fantasy aggression, the 
purposes of validity would still be well 
served. 


aggression 


However, in any group of 
children we 
among 


may expect variations 
children in the amount of 
aggression anxiety and so there may 
be no negative or positive relation- 
ships between real and fantasy ag- 
gression. 

The problem is the ubiquitous one 
of whether doll play behavior is 
replicative of real life or wish fulfilling 
in relation to real life. Bach (1945) 


estimates that more than 75% of 


children’s doll play responses is repli- 


cative and the writers’ experiences 
tend to support this contention. This 
further complicates matters because 
the validity problem would be more 
amenable to solution if a child were 
consistent in one mode or the other 
whereas he probably 
within a single session. 


varies even 
These prob- 
lems will be taken up again later. 
Against this pessimistic backdrop, 
we may inspect the validity of doll 
play behavior against the following 
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criteria: observation of real life be- 
havior, teacher’s ratings, other meas- 
uring techniques, and questioning 
the child. 

Observation of real life behavior. 
Several impressionistic accounts do 
not agree. Despert (1940) reports 
that doll play home life had ‘‘associ- 
ated emotional expressions not in all 
cases in accordance with the observa- 
tions made on their overt social be- 
havior (family or group)” (p. 25). 
By contrast, Miller and Baruch 
(1950) and Henry and Henry (1944) 
say that various types of aggression 
and sibling rivalry are congruent 
between doll play and real life. 

In a well worked-out study, Isch 
(1952) compared behavior in doll 
play during four sessions with the 
observations of mother-child inter- 
action in two half-hour sessions. The 
correlations tended to be low for 
equivalent categories—around r=.20 
—but Isch believed that fantasy 
For 
example, when the mothers were 
highly rejecting and highly aggressive 
the children represented the mother 
doll as aggressive. In general, aggres- 
sion was more severe in fantasy than 
in real life, e.g., burning a doll in the 
stove. 

Two other studies are relevant. 
R. R. Sears (1947), relating several 
studies, reports a complicated rela- 
tionship between aggression in nur- 
sery school and in doll! play. Children 
who were least aggressive in preschool 
exhibited both extremes in doll play 
aggression, the determining factor 
being how severely the subjects were 
punished at home for aggression. 
Heinicke (1956) says that there is a 
generally good correspondence be- 
tween nursery school behavior and 
actions in doll play by 2-year-olds, a 
younger group than is usually em- 
ployed in doll play research. How- 
ever, it should be remembered that 


tended to reproduce real life. 


2-year-olds do not engage in doll 
play, in the usual sense. 

Teacher's ratings. The relationship 
between teacher’s ratings and doll 
play behavior is unclear for several 
reasons. For one, the results them- 
selves are contradictory. For an- 
other, where the teacher is not rating 
actions'similar to those manifested in 
doll play but is providing data for 
predicting doll play behavior, the 
findings are usually rationalizable, 
post hoc, by common sense or by one 
theoretical scheme or another. In 
line with the latter point, for example, 
Bach (1945) reports that children 
rated as ‘‘compliant”’ by their teacher 
had, in doll play, more fantasies about 
school, more stereotyped fantasies, 
and were less aggressive toward the 
teacher doll. These, and other find- 
ings like them, seem reasonable, but 
before we accept them as evidence of 
formal validity, the specifications for 
the rejection of the hypothesis are 
necessary. Unfortunately, the state 
of theory,in personality development 
is not yet able to provide such specifi- 
cations. 

The problem of replication versus 
wish fulfillment particularly troubles 
the interpretation of the relationships 
between aggression in the classroom 
and in the fantasy situation. One 
prediction, based on a theory of dis- 
placement, is that docile children in 
the classroom will be aggressive in the 
fantasy situation, but the prediction 
must further involve the manner in 
which the child’s real life aggressions 
are handled. Restrictive classrooms 
appear, for instance, to depress fan- 
tasy aggression (Levin, 1955). 

As with so much of the doll play 
data, the findings of different re- 
searchers do not agree. Bach (1945) 
found that children rated as ‘‘nor- 
mally aggressive’ showed less the- 
matic aggression than did either of 
the extreme groups. Isch (1952), at 
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least during the first three of four doll 
play sessions, found just the opposite. 
By the fourth session, subjects rated 
as ‘“‘strongly aggressive’ showed the 
most fantasy aggression. Korner 
(1949) found no relationships be- 
tween teacher ratings of hostility and 
the manifestation of hostile actions in 
doll play. Bach (1945) may have a 
resolution to this dilemma when he 
reports that there is a closer corre- 
spondence between rated and fantasy 
behavior for those children who 
“identified” with a doll—called it 
“I,” or protected it, etc. 

Two impressionistic attempts at 
validation disagree so completely that 
they do little more than confuse the 
issue. In Bryan’s study (1940) 
teachers could match complete pro- 
tocols of doll play with the appropri- 
ate children accurately only in one 
out of 20 attempts. By contrast, 
Walsh (1956) reports 90% agreement 
between doll play and teachers’ rat- 
ings on such variables as freedom of 
action, freedom and adequacy of 
emotional expression, and response 
to environmental stimuli. 

Relationship of doll play to other 
measuring techniques. The several 
studies of predictive validity give a 
generally more hopeful account of the 
validity of various doll play measures. 
Ryder (1954) reports that behavior in 
doll play agrees with that in balloon 
play and blocking; Simpkins (1948) 
found that when the Ames picture 
stories and doll play were scored on 
the same categories the agreement 
was high, although there were more 
responses—many of them nonthe- 
matic—in doll play than in the story 
situation. Witkin et al. (1954) in a 
study different from most using doll 
play, found that children who ex- 
hibited much organization in fantasy 
play tended to be able to resist field 
influences in perception as ascertained 


by tilting-room-tilting-chair tests and 
by rod and frame tests. 

Radke (1946) strikes the only 
dissident note in this rubric of valid- 
ity, among the authors who treat 
predictive validity. She failed to find 
relationships between doll play and 
projective picture identifications 
which she used as part of a large 
battery of measures on preschool 
children. 

Doll play and direct questioning of 
children. Many of the same factors 
which make doll play data difficult to 
understand also influence the ways in 
which children answer interview ques- 
tions, so that the relationships—or 
lack—between the two must be 
treated cautiously. The agreement in 
responses to the two questions, Which 
parent do you like best? and Which 
one does the little boy (doll) like 
best? ranges from 25% to 63%. 
When the inquiry is phrased as 
“Which doll loves other most often?” 
the agreement between the answer 
and nonstereotyped doll play goes up 
to 68.4% (Graham, 1955). The 
closer correspondence of the second 
study is reasonably attributed to the 
likelihood that the child was reporting 
about his doll play performance 
itself rather than about the ante- 
cedents of the fantasy. 

In summary, the findings on valid- 
ity are not substantial, if by validity 
we mean the correspondence between 
doll play and nonfantasy behavior. 
On theoretical grounds, strong con- 
gruence should not be expected. 
More definitive tests of validity must 
take the form of construct validity 
which in turn waits on clear and 
unequivocal hypotheses. 


AREAS OF RESEARCH 


One of the qualities of doll play 
which has made it attractive to re- 
searchers is the flexibility with which 
it can be adapted to different content 
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areas. Modifications of equipment 
and procedure have made it possible 
to study a great variety of human 
problems “‘in miniature.’’ Among the 
areas of doll play research are the 
following: constructive-destructive 
tendencies (Ackerman, 1937), father 
fantasies of delinquent children (Bach 
& Bremer, 1947), evaluation of play 
therapy (Bixler, 1942), car sickness 
(Conn, 1938), concepts of parental 
roles (Finch, 1955), sibling rivalry in 
American (Levy, 1936) and in Pilaga 
Indian children (Henry & Henry, 
1944), aggression and aggression anx- 
iety in accident repeaters (Krall, 
1953), hostility in allergic children 
(Miller & Baruch, 1950), adult schizo- 
phrenia (Rosenzweig & Shakow, 
1937), reactions to the discovery of 
genital differences (Conn, 1940; Levy, 
1940), self-concepts of underachivers 
(Walsh, 1956), and achievement and 
work fantasies of industrious children 
(Melville, 1959). 

In fact, the number of variables 
that have been educed from doll play 
is so great that we cannot catalog all 
of them. Instead, five problems 
which have been investigated exten- 
sively will be discussed below in an 
attempt to summarize the present 
state of information about them and 
to illustrate some measuring problems 
commonly found in the use of the doll 
play technique. The areas are aggres- 
sion, stereotypy, doll preference, 
father absence, and prejudice. 
Aggression 

Far more than any other behavior, 
aggression has been investigated by 
doll play techniques. ‘The investi- 
gator has assurance that at some 
point in the doll play procedure, a 
substantial number of children will 
evidence some aggressive acts. The 
behavior may be verbal, or acting 
out the aggression, or a combination. 
Conceptually, aggression has been 
defined as any act whose intent is to 


injure, physically or psychologically, 
another doll or equipment. Opera- 
tionally, this common definition pre- 
sents certain difficulties. First is the 
inference of intent. This part of the 
definition is designed to eliminate 
accidental aggressive acts. Since the 
child is manipulating dolls and furni- 
ture in a small space, he will from 
time to time knock over a doll or a 
piece of equipment without appar- 
ently meaning to. Investigators 
often want to ignore such fortuitous 
acts, and, in fact, it is not difficult to 
distinguish such accidental acts from 
; It seems to 
us that the problem in operation is 
far less serious than is the inclusion of 
intent in the formal definition. 

Another category of events not 
covered by the definition but often 
scored as aggression is the attribution 
of motives or traits by the subject to 
a doll; eg., the boy is bad, the 
mommy is mean. One way of han- 
dling this contingency is to include the 
subject as a scorable agent of aggres- 
sion and to count the above two 
examples as aggression from the sub- 
ject to the appropriate doll. 

In fact, a major virtue of doll play 
is the freedom it provides the investi- 
gator to design a scoring system that 
fits his problems. The many specific 


‘intended”’ aggression. 


categories which have been scored 
under the general aggression rubric 
are illustrated in the middle column 
of Table 1. They include total ag- 
gression, verbal and physical aggres- 
sion (often interpreted as indirect 
and direct), mischief, scolding, tan- 
gential,. displaced, projected, etc. 
The latency of the first aggressive act 
in the session has been studied, and 
usually interpreted as an index of 
aggression anxiety. The agents and 
objects of aggressive acts are popular 
topics of study. <A generalization, 
though, is that when many sub- 
categories of aggression are scored, 
the incidence in any one category is 
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TABLE 1 


SUMMARY OF STUDIES ON AGGRESSION 


Author Measures of aggression 


Reactions to aggression 
Counter-aggression 
Leaving field 
Verbal expression 
Appeal to adult 
Inhibition of aggressive feel- 
ings 
Outcome of aggression 


Ammons & Ammons 
1953 


Success 
Failure 
Objects of aggression 
Negro doll 
White doll 


Bach, 1945 


*Hostility-aggression (% of non- 
stereotyped) 

Agent of aggression 

*Object of aggression 


Bach, 1946 


oft total 


involving father doll 


*Total aggression 
acts 
*Agent of aggression 
Object of aggression 


Bach & Bremer, 1947 *Fantasy aggression (f 
Killing 
Justification of ho 
sion 
Defensive rationalization 
Aggression in response to com 
mands 
*Father doll as agent or obje 


t 


Bremer, 1947 *Total aggression (frequen 
Nonstereotyped aggression 
Stereotyped aggression 
Suffered 

ing suffering in pain 


aggression (express- 
Chasing or escaping 
Justification of aggression 
Nonthematic aggression 
*Agents of aggression 
Objects of aggression 


Caron & Gewirtz, 1951 *Total aggression (°% of total 
acts) 
*Latency of aggression 
Projection of aggression 
Agent of aggression 
Object of aggression 
Gewirtz, 1950 *Total aggression (% of total 
acts) 
Direct 


* Involved in relationships significant at .05 or better. 


Independent variables 


Age of subject 


*Sex ol s ibject 

*Teacher ratings on aggression 
and compliance in school 

*Frustrati before doll play 


*Father 
*Sex of sul ject 

Mothers’ reports of their de- 

scriptions ol absent fathers to 


separation 


children 
*Delinquency of subjects (home 


for prepsychopathic children) 


*tting compared to 


*Sex of experimenter 
*Sex of subject 
*Age of subject 


*Sac and Fox Indians compared 
to white children 


*Age of subject 
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TABLE 1—Continued 





Author Measures of aggression 


Physical injury 
Indirect 
Verbal aggression 
Discipline 
Discomfort-causing 
Agent of aggression 
Object of aggression 
Projection of aggression 
Displacement of aggression 


Independent variables 





*Sex of subject 
*Session-to-session changes 


Response attenuation (ratio 


of direct to indirect) 


Gewirtz & Caron, 1954 *Physical injury (% of totalacts) *Sex of experimenter 


Latency of aggression 
Agent of aggression 
Object of aggression 


*Sex of child 
Session-to-session changes 


Hollenberg & Sperry, *Total aggression (% of total Frustration at home (mother in- 


1951 
Hollenberg, 1949 
Sperry, 1949 


acts) 


ence 


Verbal disc ipline & aggression 


scold, threaten 
der gation 
Physical discipline 


feelings 


Physical injury to person 


Agere ssive mischief, disobedi- 


terview) 
*Punishment for aggression at 
home (mother interview) 
*Sex of child 
*Session-to-session change 
*Disapproval of aggression during 
experimental session 


Physical injury to equipment 
States of uncomfortable feel- 


ing 
*Projection of aggression 
Intensity of aggression 


Holway, 1949 


Total aggression (frequency) 


*Total aggression (frequency) 


Jeffre, 1946 
Isch, 1952 *Total 


acts) 


aggression (% of 


Agent of aggression 
*Object of aggression 





*Total aggression (frequency) 
*Ageression mischief 
*Verbal aggression 


Johnson, 1951 


> 


*Physical injury to per- 
son or equipment 


total 


*Contra- 
social 


Mother interview data on 

(a) Strictness of feeding 

schedule 

(b) Mothers feeling tone on 
feeding schedule 
Number of months breast 
fed 
(d) Age begin toilet training 


(c) 


*Teacher ratings on aggression 

*Session-to-session changes 

Observed mother-child interac- 
tion (rejection, aggression) 





*Age of subject 
*Sex of subject 
*Session-to-session changes 


*States of uncomfortable feeling 


Verbal discipline |p ‘ 

> . "lel he }*Prosocial 
Physical discipline} 
*Agent of aggression 
*Object of aggression 
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TABLE 1—Continued 





Author Measures of aggression 





Korner, 1949 Total aggression (frequency) 
(Listing of ty pes 


observed ) 


of aggression 


Krall, 1953 
Krall, 1951 


*Total aggression (frequency 
Action aggression 
*Verbal aggression 
Aggressive anxiety 
Differences between verbal ag- 
gression & action 
sion 
*Latency of aggression 
Inhibition of aggression 
Displaced aggression 
Projected aggression 


ageres- 


Levin 1953 


*Total aggression (‘ 


acts) 





Levin, 1955 


acts) 





*Total aggression (% of total 
acts) 


Levin & Sears, 1956 


Levin & Turgeon, 1957 *Total aggression 
acts) 


Levy, 1936 Prevention of hostility 
Direction of hostility (order of 
attack on different objects 
Forms of hostility 

Mild 

Simple assault 

Primitive hostility 
Self-punishment & retribution 





*A cy 


*Sex of sul 


*Severity 





Independent variables 


Content of incomplete story 
Hostility ratings based on parent 
& teacher ratings 


ident compared to 


accident free children 


prone 


*Sex of subject 


Session to-session hanges 


ject 
of punishment of aggres- 
sion at home (mother inter- 
view) 
-to-session changes 
fighting in class 


*Sex of subjec t 

*Dominance-control of classroom 
teacher (observed) 

Session-to-session changes 


*Sex of subjec t 

*Identification 
(mother interview) 

il punisher (mother 


interview ) 


with parent 


*Sex of usu 
punishment for ag- 
gression toward parents 
(mother interview) 


*Session-to-session changes 


Severity of 


*Ordinal differences 


Socioeco! status 


LOTTLIC 


*Mother’s presence at doll play 


session 


*Stranger’s presence at doll play 
session 
*Sex of sul je t 


Sibling rivalry problems of sub- 


jects 
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TABLE 1—Continued 


Author Measures of aggression 





Presence-absence of 
*Direct hostility 
*Indirect hostility 

Displaced hostility 
*Hostility against self 


Miller & Baruch, 1950 


Phillips, 1945 


*Total aggression (frequency) 


Pintler, 1945 


*Total aggression (frequency ) 
Latency of aggression 


Pintler, Phillips, & 
Sears, 1946 


Robinson, 1946 


*Total aggression (frequency 


Total aggression (fre 
Stereotyped 
Nonste reotyped 

Agent of : 

*Object of ag 


jyuency 


iggression 
gression 


Ryder, 1954 


*Rating of aggressive feeling 

Rating on inhibition of aggres- 
sive feeling 

Total aggression (frequency 


Scott, 1954 Total aggression (frequency 
*Acent of aggression 
*( Ibjec t of aggression 
Sears, 1951 *Total aggression 
Nonthemati 
Thematic 
*Bodily injury 
*No bodily injury 
Nonpersonal aggression (by 
dolls toward nonpersonal 


\ 


ol jet ts) 
T rouble 
mons, catastrophes, 


f 


as result of 
or 
imaginary characters 
*Latency of aggression 
Agent of Aggression 
*Object of aggression 


Sears & Pintler, 1947 


Agent of aggression 
*Object of aggression 
Content of aggression 


Sears, Pintler, & Sears, *Total aggression (frequency) 
Agent of aggression 
*Objec t of aggression 


Independent variables 


*Allergic compared to nonallergic 
problem children 


Realism of materials 
Session length 


*Session-to-session changes 


*Experimenter-subj 
tion 
*Organization of materials 


*Session-to-session changes 


ect interac- 


*Sex of s ibjex t 


*Type of doll family: standard or 
ite of subject’s family 
absence of sibling in 


dupli 
Presence or 


subject's family 


*Father separation 


*Sex of subject 


*Separation from parents 


*Sex of ibject 
*Age of sul 


*Sibling status 


yect 


+ Sa , 
Father absence 


*Session-to-session changes 


*Sex of subjec t 


*Sex of subject 

*Age of subject 

*Father separation 
*Session-to-session changes 





Author 


Stamp, 1954 
(story completions) 
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Measures of aggression 


*Direct (% total aggression: self 
doll—parent) 

*Indirect (% total aggression: self 
doll, with implied intention; or 
—parent but not by self) 

Directed—self (% total aggres- 
sion ) 

*Displaced (remaining 
aggression ) 


w/ total 


Agent of aggression 
Object of aggression 
*Total aggressive acts 
*N sreotyped aggressi 
Nonstereotyped aggressi 
*Stereotyped 
*Tangential aggression 


PLAY 


Independent variables 
*Teacher ratings of subjects as 
“‘rebellious’’ or “‘submissive”’ 
*Sex of subject 
Session-to-session changes 


*Sex of subject 
Experimentally induced frustra- 
tion, antecedent to doll play 
(a) Failure 
(b) Satiation 
*Session-to-session changes 


*Latency of aggression 


so small that they are combined for 
purposes of analysis into large group- 
ings such as “‘total aggression’’ or 
“‘direct’”’ and ‘‘indirect’’ aggression, 
etc. 

The tendency to proliferate basic 
categories and to recombine them 
into various indices presents a diffi- 
cult problem for comparing and 
evaluating studies. Since a large 
number of combinatorial indices are 
possible from a few basic variables 
and since experimenters choose for 
theoretical or other reasons to form 
different combinations, studies which 
should be comparable are not. The 
evaluator is also tempted to think 
that many of the combinations and 
arithmetic manipulations of 
were reached post hoc and to wish for 
replications of findings. 

As illustrative of the large amount 
of research on aggressibn, only a few 
topics will be discussed in detail: sex 
and age influences; session-to-session 
changes; and the child rearing ante- 
cedents of total, displaced, and pro- 
jected aggression. 

Age, sex, and aggression. The single 
best documented finding using the 
play technique is that boys are more 


scores 


Still, in spite 
of the overwhelming evidence on this 


aggressive than girls. 


point there are a few contradictory or 
nonconfirmatory results. Krall(1951) 
reported more aggression among the 
girls in her sample, but a careful 
check on her data is best interpreted 
as no rather than reversed sex differ- 
Likewise, Henry and Henry 
(1944) reported no sex differences in 
aggression for Pilaga Indian children, 
and Hoilenberg and Sperry (1951) 
found none among lowa City nursery 
Since the findings 
are so overwhelmingly in the other 
direction the burden of explaining the 
dissenting results must fall on these 
few investigators. 

E. Z. Johnson (1951) adds an im- 
portant result to the repetitive “boys 
more than girls’’ data. She found 
that boys do exceed girls in physical 
aggression, but that girls show more 
verbal aggression than do boys. This 
finding is reasonable in light of the 
findings on overt 
gression. 

Johnson’s finding in regard to age 
of the subjects is also provocative. 
Younger children show more of what 
she calls 


ences. 


school subjects. 


nonfantasy—ag- 


‘“‘contrasocial”’ aggression, 
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while older children’s aggression is 
more ‘‘prosocial,’’ usually depictions 
of the parents punishing the children. 
As one compares the 5- with the 8- 
year-olds, the usual sex difference in 
aggression decreases. Caron and 
Gewirtz (1951) confirm this finding. 
P. S. Sears (1951), on the other hand, 
found that the sexes become more 
different in this respect as they are 
older, but it must be remembered that 
her subjects ranged in age from 3 to 5, 
which is younger than the youngest 
group in the other two studies citing 
age differences in fantasy aggression. 

Session-to-session changes in doll 
play aggression. Second only to the 
consistent finding that boys are more 
aggressive than girls, is the ubiqui- 
tous result that children are more 
aggressive in the second compared to 
the first session of doll play (e.g., 
Hollenberg & Sperry, 1951; Levin & 
Sears, 1956; Sears, 1951). Although 
the amount of aggression increases, 
children tend to maintain their rela- 
tive rank order in aggressiveness 
(Sears, 1951). The above findings 
apply to the first two sessions. When 
children participated in more than 
two sessions, aggression in the later 
sessions presented a more compli- 
cated picture. For example, Jeffre 
(1946) reported that across four 
sessions, aggression toward the ex- 
perimenter and equipment increased, 
which seems a likely reflection of a 
child’s frustrated boredom with the 
doll play task. Pintler (1945), using 
three sessions, found that the latency 
of the first aggressive act decreased 
in the later sessions. 

The increase in aggression appears 
to be related to amount of experience 
in doll play and not particularly to 
the interval between sessions. Phillips 
(1945) compared doll play perform- 
ance in a single one-hour session 
to three 20-minute sessions. The 
changes that occurred between the 
first and final thirds of the massed 


session were similar to those between 
the first and last distributed sessions. 

A reasonable explanation of this 
common finding is that the child 
learns with time that the restraints 
against the expression of aggression 
are not operative in doll play and 
hence he may vent his impulses more 
freely. The fact that when a stranger 
is introduced into a second session 
the usual increase in aggression does 
not occur lends experimental cre- 
dence to this interpretation (Levin & 
Turgeon, 1957). 

Child rearing antecedents of doll 
play aggression. The hypotheses re- 
lating certain child rearing practices 
to aggression in doll play have come 
from psychoanalytic and behavior 
theory. The setting of doll play is 
thought of as a situation relatively 
free from real life restraints and so 
appears on a similarity dimension 
with home and school, but different 
enough from the real life settings so 
that the restraints against aggressive 
expression are less potent. If, there- 
fore, aggression is punished at home, 
such actions are less likely to occur at 
the point of punishment but will be 
manifest in the safety of doll play. 
The general hypothesis has been that 
there is positive correlation between 
severity of punishment at home and 
the incidence of aggression in doll 
play. One shortcoming of the dis- 
placement hypothesis is that it does 
not predict a higher frequency of 
incidence in doll play than the less 
severely punished condition—only 


that such denied behaviors will ap- 
pear in fantasy but not in real life. A 


conflict drive hypothesis has been 
added to the original displacement 
one to cover this lack (Whiting & 
Child, 1953, p. 353). This additional 
hypothesis postulates a drive incre- 
ment due to the subject’s desire to 
express aggression and his fear of 
such expression. Since drive operates 
multiplicatively, the combination of 
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hypotheses covers the prediction of 
severe parental punishment leading 
to high fantasy aggression. 

What is the evidence for this hy- 
pothesis? Hollenberg and Sperry 
(1951) reported confirmation. In an 
earlier summary report of research 
performed under his direction, R. R. 
Sears (1947) found the predicted 
state of affairs: those children who 
were most severely punished at home 
were most aggressive in doll play. 

An attempt to replicate this finding 
further entailed fairly elaborate 
changes in the hypothesis (Levin & 
Sears, 1956). On a larger and more 
varied sample—the previous studies 
were done with university nursery 
school groups—the simple ‘‘punish- 
ment leading to aggression” hypothe- 
sis did not hold. Rather, doll play 


aggression was shown to be predict- 
able from a combination of the sex of 
child, the real life agent of punish- 


ment, and the nature of the child’s 
identification with his parent, as well 
as the severity of punishment. In 
general, these findings lend them- 
selves more easily to a replicative 
rather than to a displacement inter- 
pretation. Taken in the light of E. Z. 
Johnson’s (1951) finding that older 
children evidenced more _ prosocial 
aggression, the more mature, identi- 
fied children may be portraying the 
parental punishment that they have 
experienced. It is interesting to note 
that real life aggression among pri- 
mary school children was predictable 
from much the same variables as the 
fantasy behavior in the Levin and 
Sears’ study (Eron, 1958). 

One other study based on the dis- 
placement hypotheses obtained com- 
pletely contradictory results (Levin 
& Turgeon, 1957). The prediction 
that the presence of the mother at 
the doll play session would redinte- 
grate aspects of the home and reduce 
the freedom of doll play was not 
borne out. The opposite finding 


emerged; aggression was more fre- 
quent before the mother compared to 
the control condition. The investi- 
gators called the original hypothesis 
into question and suggested that 
there are characteristics of the doll 
play situation which make doubtful 
its use as a point on a simple freedom- 
from-inhibition dimension. 

Wurtz (1960) ina recent theoretical 
statement argued that mild aggres- 
sion anxiety should facilitate attenu- 
ated aggression in doll play. He 
found some confirmation for this 
notion in a reanalysis of earlier data 
reported by P. S. Sears (1951) and 
Pintler, and Sears (1946), 
the index of attenuation -is 
based on the use of child compared 
to adult dolls as agents and objects 
of aggression. 

In addition to thinking of total 
doll play aggression as a manifesta- 
tion of displacement, the same phe- 
nomenon has been studied within the 
doll play situation itself. If a child 
has been punished for aggression, the 
depiction of this punishment in doll 
play should arouse more anxiety than 
in cases of aggression toward a doll 
less similar to the performer. This 
conceptualization creates substantial 
difficulties. It implies that although 
doll play in general is not very in- 
hibiting, there is still sufficient anxi- 
ety to influence the choice of dolls 
that act as the objects of hostility. 
We might expect, therefore, a mild 
and not very consistent effect on the 
choice of objects of aggression. The 
unreliability should be compounded 
by the low incidence of acts which 
determine any displacement score. 
For example, if 15% of all doll play 
units are aggressive and this percent- 
age is divided among five dolls 
equally, we are dealing with expected 
displacement scores of 3% of the 
total number of acts, and the unreli- 
ability of this miniscule proportion is 
obvious. 


Sears, 
when 
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The implications of the displace- 
ment hypothesis for understanding 
pro- and contrasocial aggression are 
especially difficult to justify. If 
severely punished, highly identified 
children accurately replicate their 
parents’ punishment in doll play, 
they would be showing little displace- 
ment although they have experienced 
severe punishment. 

The final comment on doll play 
analysis of displacement is, to our 
thinking, most serious and applies 
equally below to the discussion of 
projection. How are the doll agents 
or objects of aggression to be ordered 
for the analysis of the two defense 
mechanisms? Most often, the assump- 
tion is made that the child uses the 
doll most similar to himself as the 
point of origin on the similarity 
continuum. For children who are 
strongly identified with their parents, 
this assumption is suspect. Granted 
this point, however, the additional 
points create greater difficulties. 
Should the grouping be by sex or age? 


Does the dimension for a girl go: girl ; 


(G), mother (M), boy (B), fa‘ her 
(F), baby (bb); or G, B, M, F, bb; or, 
perhaps, G, M, F, B, bb; etc.? All of 
these are empirically answerable, 
albeit difficult, questions. One pos- 
sibility is that the dimension is an 
idiosyncratic, response mediated one. 
Another tack may be that the nature 
of the dimension varies depending on 
the behavior being studied; i.e., one 
sequence of dolls for aggression, an- 
other for dependency, etc. As will be 
pointed out in the final section of this 
paper, there is little evidence that can 
be brought to bear on these questions. 

An analog to displacement within 
the doll play session is “projection,” 
which is defined in terms of the doll 
agents of aggression. Presumably, a 
doll most similar to the child carrying 
out hostile acts represents projection 
of the child’s hostile impulses to the 
doll. The above comments on dis- 


placement also apply to this mecha- 
nism. 

A number of studies relate the 
agents and objects of aggression to 
demographic characteristics of the 
child, as can be seen in Table 1. How- 
ever, a direct test of the displacement 
or projection formulations requires 
information about the nature of the 
child’s aggression anxiety as well as 
the dolls he chooses to initiate and 
receive hostile acts. Only one study 
yields this information directly (Hol- 
lenberg, 1949). She found that chil- 
dren who were severely punished for 
aggression at home projected aggres- 
sion more in doll play than did less 
severely punished children. Com- 
parable data on displacement are not 
available. 

In summary, the demographic and 
practice correlates of doll play ag- 
gression are clear and substantial. 
However, the problems of greater 
theoretical interest—the child rearing 
correlates of doll play aggression— 
must, because of their conceptual 
unclarities and inconsistent results, 
remain open questions. A thorough 
test of the displacement model would 
require information about the anxi- 
ety attached to the expression of 
aggression at home, the amount of 
such behavior actually exhibited at 
home, the instigation to aggression, 
and the amount of aggression shown 
in doll play. A questionable assump- 
tion is that the instigation to aggres- 
sion is more or less the same in doll 
play as in the home—that it is a 
characteristic of the person inde- 
pendent of the situation. No single 
study fulfills more than one or two of 
these requisites. 


Stereotypy 


Many doll play studies have cate- 


gorized 


routine, habitual actions, 
‘doll action appropriate to the time, 
place, situation, and characters in- 
volved”’ (Phillips, 1945). These be- 
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haviors are most often termed “‘stere- 
otyped,”’ although they sometimes 
have been labeled “realistic’’ or 
“routine role.’’ Such doll actions 
usually constitute a considerable part 
of the total acts in a session. Krall 
(1953) reports that stereotyped the- 
matic responses constitute 45% of all 
responses made in doll play, and 
Bach (1945) and E. Z. Johnson 
(1951) report that 59% and 66% of 
thematic responses are stereotyped 
actions. 

The most consistent finding with 
regard to stereotypy is a sex differ- 
ence: girls show more stereotyped 
behavior than do boys (Bach, 1945, 
1946; Bremer, 1947; Honzik, 1951; 
Pintler, Phillips, & Sears, 1946; 
Yarrow, 1948). This finding might 
be attributable to the greater famil- 
iarity of girls with doll playing, but 
since it occurs from age 3 onward, it 
would seem more likely to be related 
to greater inventiveness of young 
boys. As a case in point, Tuddenham 
(1952) reports that first, third, and 
fifth graders recognize that the 
“typical girl’ is less daring than the 
“typical boy.” 

The amount of stereotypy de- 
creases from session to session (Bach, 
1945; Phillips, 1945; Yarrow, 1948), 
a fact which may be explained in 
several ways. The higher incidence 


of aggression in later doll play ses- 
sions may displace stereotyped re- 


Also, the relaxation of 
restraints in the second session which 
yields more aggression may also lead 
to more nonstereotyped, nonaggres- 
sive behaviors. It seems natural that 
a child faced with a new situation 
would first represent the most highly 
practiced behaviors—the routine acts 
of the home. 

Attempts have been made to re- 
late stereotypy to adjustment. Bach 
(1945) reported that children whom 
teachers rated as being ‘well ad- 
justed”’ showed a higher rate of de- 


sponses. 
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crease of stereotypy over sessions 
than did “poorly adjusted”’ children. 
Holway’s (1949) findings on “real- 
istic’ play, which seems to be closely 
related if not identical to stereotyped 
play, show that at the end of therapy, 
children play more realistically using 
less fantasy, aggression, or tangential 
(nondoll) play. Holway’s study at- 
tempted to relate doll play to child 
rearing variables. She found that 
realistic play was positively related 
to the amount of early self-regulation 
in feeding and the number of months 
the child was breast fed. 

In Holway’s (1949) sample of 3-5 
year olds, there was no correlation 
between realistic play and either CA 
or 10. However, Graham (1952), 
comparing seven “bright’’ primary 
school children with “dull” 
ones, found that the brighter children 
used more stereotyped 

Aside from the 


session-to-session changes, and pos- 


seven 


responses. 
sex differences, 
sible IQ differences, there have been 
no other substantial findings with 
regard to stereotyped play. In stud- 
ies of delinquents (Bach & Bremer, 
1947), accident (Krall, 
1953), and various methodological 
explorations reported above (Phillips, 
1945; Pintler, 1945; Robinson, 1946), 
no significant the 
amount of stereotyped behavior were 


repeaters 


differences in 


found between experimental and con- 
trol groups. In the area of parent 
separation, the results are not con- 
sistent—Bach (1946) found that 
father-separated children showed 
more stereotyped fantasies about 
home life, whereas Scott (1954) re- 
ported that institutionalized children 
indulged in play 
than did children living with their 
parents. 


less 


stereotyped 


The stereotype category is usually 
regarded as a residual category rather 
than as a major interest. A recent 
study indicates that it may have 
some predictive value if further anal- 
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yzed. Melville (1959) found that 
children who spend a large proportion 
of their school time working industri- 
ously use the “‘work routine’’category 
(that portion of stereotyped behavior 
which is work oriented) in fantasy 
more than do less industrious chil- 
dren. Note that this is a more or less 
direct replication in doll play of ob- 
served real life behavior. Melville’s 
study suggests that a finer break- 
down of the stereotypy category 
might be profitable. 


Doll Preference 


Although many doll play studies 
record which dolls were used as 
agents and objects of fantasy acts, 
few of them report analysis of doll 
usage in any detail. The greatest 


interest in this variable has been evi- 
denced by researchers in the areas of 
aggression and the effects of separa- 
tion from parents, and the results are 
presented in the appropriate parts of 


this paper. 

Probably the best substantiated 
generalization to be made about this 
topic is that subjects tend to prefer 
the same sex parent doll to the parent 
doll of the opposite sex. This tend- 
ency shows some increase with age. 
The finding has not appeared in every 
study—e.g., Graham’s (1952) sub- 
jects, regardless of sex, tended to use 
the mother doll more than the father 
doll—but significantly greater use of 
the opposite sex parent has not been 
reported. E. Z. Johnson (1951) 
found that while in portrayal of 
routine (stereotyped) behavior all 
subjects used the mother more often 
than the father doll, the greatest 
session-to-session increase in the use 
of the father occurred among older 
boys. In a nursery school sample 
(Sears et al., 1953), the girls used the 
mother doll more than the father, 
while the boys used the two dolls 
equally, thereby employing the father 
doll more than the girls did. 


Five studies which have used rela- 
tively structured situations to lead 
the child to make a direct choice also 
report same sex preference. Ammons 
and Ammons (1949) found a father 
preference among 3- and 4-year-old 
boys, and a mother preference among 
4- and 5-year-old girls, and R. Lynn’s 
(1955) 6-year-old subjects showed a 
greater preference for the same sex 
parent doll than did her 4-year-old 
subjects. Emmerich (1959) had his 
subjects complete stories using first 
the adult and then the child dolls. 
Correspondence between the two sets 
of behaviors was taken to indicate 
high identification. He found that 
preschool children—especially boys 
—tended to identify more with par- 
ents of the same than with parents of 
the opposite sex. Similarly, highly 
sex-typed boys depict more nurtur- 
ance, punishment, and power via the 
father than via the mother doll 
(Mussen & Distler, 1959). To get at 
sex role identification, Rabban (1950) 
asked children aged 3-9 to select the 
doll that “looks most like you.” 
Starting at the age of 4, the choices 
were correct as to sex. 

Preschool children who have been 
reared permissively emphasize the 
adult dolls in their fantasy produc- 
tions (Levin, 1958). This finding may 
be interpreted in several ways: per- 
missive parents interact more with 
their children and thereby provide a 
more frequent adult model, parents 
who rear their children permissively 
permit them to explore and practice 
adult-like behaviors more than do 
nonpermissive parents, and permis- 
siveness is one of the antecedents of 
identification with parents which is 
reflected in the child’s preoccupation 
with adult actions in doll play. 


Effects of Separation from Parents 


Interest in this area grew out of 
the problems of wartime father sepa- 
ration, and the majority of studies 
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have been concerned with the ab- 
sence of the father rather than the 
mother from the home. ‘The studies 
of father absence can conveniently be 
divided into two groups: those con- 
cerned with children currently sepa- 
rated from their fathers, and those of 
children whose fathers had been 
absent during the first year or two of 
the child’s life but were living with 
the family at the time of the study. 
Bach (1946) studied children aged 
6-10 whose fathers were in the service 
abroad and had been away for 1-3 
years. He found that father-separated 
children, compared to a_ control 
group whose fathers were at home, 
produced fewer doll actions that in- 
volved the father doll; enacted a 
more stereotyped view of family life; 
and made the father doll more ag- 
gressive, less authoritarian, and more 
affectionate, than did the control 
group. Using a smaller group of sub- 
jects, he found that where the mother 
described the absent father to the 
children in deprecatory terms, the 
children portrayed the father as being 
more aggressive to his doll children, 
but as receiving more affection from 
them; i.e., unfavorable typing of the 
absent father seemed to produce 
ambivalent feelings in the children. 
Another study (Sears et al., 1946; 
Sears, 1951) found that nursery 
school children whose fathers were 
absent from the home did not show 
the session-to-session increase in ag- 
gression that is usually found. In 
addition, boys (but not girls) whose 
fathers were absent were ag- 
gressive in their fantasies (Sears et al., 
1946). The father-present control 
group of boys showed most aggression 
in doll play toward the father doll 
and the boy, doll (sex category), 
while the boys without fathers showed 
most aggression toward the father 
and mother dolls (age category) 
(Sears, 1951). 

Lynn and Sawrey (1959), using the 


less 


Structured Doll Play Test, have in- 
vestigated absence of fathers in chil- 
dren of Norwegian sailor families. 
They found that girls (but not boys) 
whose fathers were gone were more 
dependent than the control children. 
However, on a measure of “maturity” 
(choice of sleeping in a crib or bed), 
boys without fathers were less ma- 
ture than boys whose fathers were at 
home. In contrast to other studies of 
father absence, this one also investi- 
gated the child’s relationships with 
mother, and concluded by doll play 
and other techniques that the mothers 
of father-absent children were more 
overprotective than were control 
group mothers. 

Studies of homes where the father 
is currently absent do, then, find 
substantial results. Positive results 
have not been so easy to find in 
studies in the second group—those in 
which a previously absent father is 
present in the home at the time of the 
investigation. Halnan (1950), L. C. 
Johnson (1952), and Ryder (1954) 
performed doll play studies as part of 
the Stanford University research on 
father relations of war-born children. 
Only one difference was found be- 
tween responses of control groups and 
those of children aged 4-7 who had 
been separated from their father dur- 
ing the first 2 years of life. In Ryder’s 
study the doll play of the previously 
father-separated children was rated 
as revealing more aggressive feeling. 
Since this was an inferred measure 
rated by the experimenter and an 
observer, and since measures of overt 
aggression in doll play did not show 
any significant differences in this or 
either of the other two studies, it 
must be concluded that there is little 
evidence of marked effects on the 
doll play of children temporarily 
separated from their fathers in early 
life. 

In view of the recent great interest 
in the effects on the child of separa- 
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tion from his mother, it is surprising 
that doll play has not been used to 
investigate this area. So far as is 
known, there has been no study using 
this technique with children living in 
households where the mother is ab- 
sent. However, there are two investi- 
gations of children separated from 
both parents. Heinicke (1956) stud- 
ied 2-year-olds living in residential 
nurseries because their parents were 
on vacation, sick, or having another 
child. He found results which agreed 
with observations of the subjects in 
their nursery life—e.g., they sought 
the affection of adults by crying— 
but, as has been observed before, 
most of his results were not specifi- 
cally concerned with doll play re- 
sponses. Scott (1954) studied children 
separated from their parents because 
they had been institutionalized be- 
cause of neglect, mental illness in the 
family, etc. He found that the sub- 
jects showed a much greater than 
average tendency toward ‘‘metamor- 
phosis,”’ i.e., the subject himself acted 
as an authority figure and treated all 
the dolls as children. It is doubtful 
that this result should be attributed 
to parent separation as such; it seems 
just as reasonable to relate it to the 
effects of institutionalization. 


Reactions to Racial and Religious 


Differences 


Of the studies of children’s reac- 
tions to Negro-white differences, 
several (Goodman, 1952; Graham, 
1955; Radke & Trager, 1950; Steven- 
son & Stewart, 1958) have used both 
Negro and white subjects, while one 
used only white subjects (Ammons, 
1950; Ammons & Ammons, 1953), 
and another (Clark & Clark, 1947) 
used Negro subjects exclusively. In 
this area unstructured doll 
compared to structured, has not 
produced meaningful results. Gra- 
ham (1955) recorded the free play of 


play, 


Negro and white subjects with both 
Negro and white dolls, but made only 
intraracial analyses of his data. There 
were no outstanding differences be- 
tween the two groups. Goodman 
(1946) used only 24 subjects in the 
part of her study involving free play, 
and found no statistically significant 
differences between Negroes and 
whites. However, she did uncover 
several trends which seem worthy of 
follow-up with a larger group—e.g., 
Negro subjects tended to assign main 
roles to white dolls, and seldom re- 
vealed positive evaluations of Negro 
dolls. In her later studies (Goodman, 
1952), where no statistical evalua- 
tions were made, she reported that 
doll play was a successful technique, 
but it is difficult to tell how much 
attributable to the free 
play method itself since it was used 
mainly as the introduction to a doll 
play interview. 

It has been much more common, 
and apparently more profitable, to 
use controlled methods of exploration 
like direct questioning about prefer- 
ence for dolls of different color (Clark 
& Clark, 1947: Goodman, 1952; 
Radke & Trager, 1950; Stevenson & 
Stewart, 1958), identification of race 
(Ammons, 1950; Ammons & Ammons 
1953; Clark & Clark, 1947), requiring 
the child to pair dolls which ‘“‘go to- 
gether’”’ (Goodman, 1952), pairing 
dolls with middle class or slum 
houses, and with dress-up or work 
clothes (Radke & Trager, 1950), and 
various incompleted stories which 
offer an opportunity for a doll of one 
color to ‘‘win”’ over a doll of another 
color (Ammons, 1950; Ammons & 
Ammons, 1953; Goodman, 1952). 

Results obtained from these tech- 
niques, sometimes used in connection 
with more extensive interviewing 
(Ammons, 1950), are in fair agree- 
ment with one another. Negro and 
white nursery school children appear 
to be well aware of racial physical 


success is 
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differences (Ammons, 1950; Clark & 
Clark, 1947; Goodman, 1952). Both 
racial groups are likely to identify 
with the white doll when asked 
“Which looks most like you?”’ (Clark 
& Clark, 1947; Goodman, 1946), 
although with increasing age there is 
more correct identification until at 
age 7 a slight majority of Negroes 
identify with the Negro doll (Steven- 
son & Stewart, 1958). In addition, 
some of the Negroes show either 
confusion or wish fulfillment by in- 
sisting that although they are now 
dark skinned, they had white skins as 
babies (Goodman, 1946). There have 
been consistent reports that white 
dolls are preferred esthetically by 
white children, while Negro children 
do not show a clearcut preference for 
Negro dolls, but instead may either 
choose the white doll (Clark & Clark, 
1947; Goodman, 1946), show only 
a slight preference for the Negro doll 
(Radke & Trager, 1950), or show 
reluctance to make any choice (Good- 
man, 1952). Interpretation of this 


result is not unequivocal, since it may 
reflect past experience with dolls and 
story book characters who are more 


often white than colored. More im- 
portant would seem to be Radke and 
Trager’s (1950) finding that, even 
when subjects are equated for social 
class, 5-7 year olds of both races 
accept the idea that Negroes belong 
in poorer housing. 

While it seems to have been dem- 
onstrated clearly by the method of 
doll play that children are capable of 
making discriminations on the basis 
of color, it has not been shown that 
these discriminations are reflected in 
fantasy behavior in any consistent 
way. Ammons (1950) reported that 
white boys showed a tendency with 
increasing age to use Negro dolls as 
scapegoats. On the other hand, 
analysis of the same data did not re 
veal any differences in the . . 
Negro vs. white dolls in conflict-- 


whichever doll the subject was using 
at the moment tended to be successful 
in aggression (Ammons & Ammons, 
1953). Stevenson and Stewart's 
(1958) Southern Negro subjects chose 
the white doll as the one with whom 
they would like to play, except at the 
oldest age level—7 years—where a 
small majority chose the Negro doll. 
Goodman (1946) found no social 
acceptability differences among sub- 
jects who had chosen the white doll 
esthetically. The white doll might be 
“prettier,” but the Negro doll was 
just as acceptable as a birthday party 
guest. In further studies, Goodman's 
(1952) subjects mixed the races in- 
discriminately in free doll play. 

This lack of consistent discrimina- 
tory behavior in doll play is paralleled 
by a similar unconcern in observed 
behavior. Goodman (1946) found no 
consistent prejudice in nursery school 
behavior of her mixed racial group. 
The results of doll play are congruent 
with those of other methods in finding 
a poor correspondence between beliefs 
and the development of interracial 
behavior. The evidence on this point 
has been reviewed by Harding, 
Kutner, Prochansky, and Chein 
(1954). 

Hartley and Schwartz (1951) de- 
scribed materials and procedures for 
studying attitudes toward religious 
groups. Subjects were given three 
doll families, each of which stands in 
front of a montage background of 
photographs, one suggesting a Jewish 
religious context, one Catholic, and 
the other a middle class home without 
any religious symbols. The investi- 
gator notices what spontaneous 
identification the subject makes of 
the backgrounds, and uses these as a 
lead-in to a doll play interview with 
the child. The only data available 
from the use of this technique are 
some protocols, but it appears to be 
easily adaptable to the analysis of 
group differences. 
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Experimental Manipulations 


It is obvious, at this point, that 
the bulk of the studies has employed 
doll play to measure naturally exist- 
ing characteristics of the subjects, 
with no attempt to influence these 
characteristics. By contrast, projec- 
tive studies of adults have recently 
used experimental variations both to 
test specific hypotheses for which 
manipulation is relevant as well as to 
ascertain the validity of the measure- 
ment (e.g., McClelland, Atkinson, 
Clark, & Lowell, 1953). 

The four experimental studies of 
doll play divide into two groups: 
either some experiences of the child 
prior to doll play or experiences dur- 
ing the course of the procedure are 
varied. In the first category, Bach 
(1945), testing a frustration-aggres- 
sion hypothesis, subjected some of his 
preschool subjects to a longer rest 
period than others just before a doll 
play session. Since a long rest was 


presumably frustrating, these chil- 


dren elaborated the rest theme in 
their fantasy output more often and 
more aggressively than did the short 
rest group. Yarrow’s (1948) results 
are less clear. -He had one group of 
subjects play with a difficult tinker 
toy before doll play and compared 
them to a group who were given an 
easy task. The frustrated subjects 
tended to show increased aggression, 
more tangential play, and distorted 
thematic play than the other subjects 
but the results were not statistically 
significant. When the children ex- 
perienced antecedent satiation—put- 
ting pegs into boards until they re- 
fused to continue—they gave more 
inappropriate thematic units—e.g., 
sleeping in the kitchen. 

To test the effects of aggression 
anxiety on fantasy aggression, Sperry 
(1949) compared three groups of 
children, each of whom participated 
in four sessions. For one group the 
experimenter disapproved of the sub- 
ject’s aggression in the second session. 


The experimenter disapproved of the 
subject’s aggressive acts in the second 
and third sessions for another group 
and expressed no disapproval to the 
control group. Only the group pun- 
ished in the second session decreased 
their disapproved acts in the third 
period (Hollenberg & Sperry, 1951). 

Working also with a model of ag- 
gression inhibition, Levin and 
Turgeon (1957) compared two groups 
of subjects. The first group's second 
doll play session was observed by 
their mothers; in the other group a 
strange adult female was present. 
Mothers facilitated the children’s 
aggression whereas the stranger in- 
hibited socially disapproved acts. 

In general, doll play has suffered 
from a dearth of experimental treat- 
ment. Some experimental operations 
relevant to the variables being meas- 
ured would add to the validity of the 
method and, to judge from other 
projective techniques, would provide 
more discriminating measures of in- 
dividual differences. 


DISCUSSION 

What can we say now about the 
doll play technique, which two dec- 
ades ago appeared so promising? 
Certainly an overall body of sensible, 
interrelated findings is not apparent. 
Where doll play was used in a con- 
nected group of studies from one 
laboratory, coherent results do ap- 
pear. Otherwise, single investigators 
performing one or two studies using 
the method occasionally report inter- 
esting results but there are almost as 
many islands of findings as there are 
investigators. One might hope that 
the common method would provide 
the links between studies, but the 
flexibility of doll play, both in proce- 
dure and scoring of variables, makes 
the connections among findings ten- 
uous. 

In the area of aggression there are 
results that have been replicated. 
However, their very redundancy 
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makes them appear trivial in com- 
parison to what might have been dis- 
covered in the years of effort. We 
may take as fact that young boys are 
more aggressive than young girls and 
that children are more aggressive in 
the second than in the first doll play 
session. Most other doll play findings 
have to be hedged with boundary 
conditions, and restrictions must be 
put on general statements. 

To understand this state of affairs 
it may be useful to review the virtues 
and shortcomings of doll play. We 
believe that the meager payoff comes 
not from the technique itself, but 
from the assumptions which underlie 
the method. First, what should any 
method of assessing personality pro- 
vide? Objectivity has not often been 
a problem in doll play so long as the 
variables are carefully defined and 
the scorers are well trained. Reli- 
ability has been looked at both in 
terms of the consistency of behavior 
and in terms of categorizing agree- 
ment among scorers. Besides, the 
method is not heavily dependent on 
verbalization, which recommends it 
for use with young children, and it is 
interesting to them. The major diff- 
culties appear in understanding what 
the method is measuring. 


Replication and Wish Fulfillment 


The basic question that has influ- 
enced the understanding of doll play 
is whether the child is telling about 
events and hopes and plans which are 
available to him in his day-to-day 
world or whether his acts in this set- 


ting are otherwise unavailable. The 
criterion for identifying wish fulfilling 
fantasies is that nonfantasy expres- 
sion of the behaviors is prohibited 
and they are then expressed in fan- 
tasy. The prohibitions may be actu- 
ally imposed on the child or may 
result from natural conditions: e.g., 
his color or sex or size. Therefore, the 
specifications for wish fulfilling fan- 
tasies are four: evidence that there 


are in “real life’’ some restraints 
against the expression of the behavior 
in question, a desire for such expres- 
sion, little overt manifestation of the 
behavior, and the appearance of the 
behavior in fantasy. 

Few research studies include all of 
the requirements of the wish fulfill- 
ment model. The studies of parental 
punishment and fantasy aggression 
make certain assumptions about the 
model, but both the results and the 
assumptions must be questioned since 
no study clearly replicates another. 
For example, it is assumed that 
severe parental discipline inhibits 
overt expression, yet there is some 
evidence that punishment and overt 
aggression are positively correlated 
(Sears, Maccoby, & Levin, 1957). 

In the studies of racial identifica- 
tion and prejudice, the assumptions 
although not usually specified, are 
often reasonable. For example, a 
substantial number of Negro children 
indicate in doll play that they want 
to play with white children (e.g., 
Clark & Clark, 1947). To take this as 
wish fulfilling behavior we need to 
know that such interracial play is not 
possible and actually does not occur. 
These inferences may be based on 
sociological characteristics of the 
child’s neighborhood, although it is 
preferable to test these assumptions 
directly. 

The inclusion of ‘“wishes’’ under 
the replication rubric requires some 
explanation. If the child’s wishes are 
not denied real expression, this cate- 
gory of behavior does not fit our wish 
fulfillment model. One way of think- 
ing about doll play behavior is that it 
gives the child an opportunity to 
express his current experiences and 
preoccupations. The correspondence 
between real life and fantasy need not 
be uninteresting forresearch purposes. 
In this type of fantasy the child may 
give the researcher a picture of his 
thoughts and actions which would be 
much more difficult to elicit in an in- 
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terview. Also, so far as the child’s 
functioning is concerned, replicative 
fantasy may well provide him an 
opportunity to practice and develop 
skills which are transferable to his 
nonfantasy life. 

To take advantage of the wish 
fulfillment-replication distinction in 
research, it would be most helpful if a 
child consistently acted either one or 
the other type of fantasy. Unfortu- 
nately, such is probably not the case. 
A child may change his emphasis 
from session to session or may vary 
the proportions of fantasy within a 
session. The ideal condition would 
allow the researcher to categorize a 
sequence of doll play as wish fulfill- 
ment or replicative. Our current 
knowledge about children’s fantasies 
preclude any such simple procedure 
although, as we suggest below, there 
may be some guides in making this 
decision within doll play itself. 

Some researchable problems 
which would aid in distinguishing 
and making use of the differences 
between replication and wish fulfill- 
ment are suggested below: 

1. Without exception in the doll 
play studies reviewed the fantasies 
have been categorized in terms of 
simple counts of units. Molar se- 
quences of behavior and units of 
interaction which are now common in 
observations of adult interaction have 
not been applied to children’s fan- 
tasies. For example, if the sequence 
is ‘‘the father spanks the boy and 
then the boy hits the father’ we 
might be more justified in tolerating 
the notion for future tests that this is 
a wish fulfilling episode compared to 
the ‘father spanks the boy and the 
boy cries.” 

Likewise, doll play actions that 
indicate that inhibitions are being 
overcome may be discernible. The 
two indices that have been used are 
latency of the first aggressive act and 
the occurrence of tangential beha- 
vior. The latter may be promising if 


analyzed in a sophisticated fashion. 
Tangential actions such as looking 
out the window or engaging the ex- 
perimenter in conversation which 
appear irrelevant to doll play may 
indicate a variety of states. The 
child may be bored, or unable to 
think of more actions to portray, 


or he may indeed be experiencing 


anxiety over some impulse which is at 
the threshold of experience. These 
possibilities could be studied within 
the doll play protocol. It would be 
interesting to see if the precursors to 
boredom are a sequence of redundant 
acts by the subjects. On the other 
hand, signs of disinhibition may be 
succeeded by behaviors we assume to 
be generally prohibited or have been 
specifically prohibited for the subject. 

In summary, we are saying that 
there exist in the usual doll play data, 
possibilities for more elaborate and 
potentially more profitable analyses 
than have so far been made. 

2. The above approach to the wish 
fulfillment and replication problem 
It is 
our belief that the study or manipula- 
tion of antecedent conditions also 
may be a fruitful tack. 

Our first suggestion is to make use 
of detailed naturalistic information. 
A log of the child’s experience for a 
day or two prior to doll play might be 
kept and the fantasy protocol com- 
pared with what we know occurred in 
the child’s life. A very detailed log is 
represented by One Boy’s Day (Barker 
& Wright, 1951). Such an approach 
is clearly inductive and simply pro- 
vides a mass of data which may be 
scrutinized for simple correspond- 
ences or for more complex transforma- 
tions between real life and doll play 
fantasy. For example, one could look 
at the ways in which an objectively 
described situation is filtered through 
the child’s perceptions, and the results 
might provide clues to types of ex- 
periences which form the raw ma- 
terials for wish fulfillment compared 


focuses on response measures. 
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to those types of experiences which 
are replicated with a high degree of 
fidelity. 

The above naturalistic approach 
may point up variables, which, 
through experimental variation, will 
provide more substantial causal rela- 
tionships between experiences and 
fantasy. For example, will a series of 
successes followed by a failure be 
fantasied as a success or failure? Does 
strongly goal oriented action that is 
not permitted consummation appear 
in doll play as goal achieved? Can a 
child be given a set to portray either 
wish fulfilling or replicative events? 

In essence, we are suggesting that 
an experimental approach to the 
antecedents of children’s fantasies 
has been tried very little and may 
provide substantial payoff. If signifi- 
cant antecedent manipulations are 
found, and their effects are potent 
and consistent across subjects, a 
more convenient response index to 
the two types of fantasy may appear. 
A case in point is the empirically 
derived scoring scheme for n Ach, 
which includes those categories of 
fantasy that respond consistently to 
experimental manipulation of arousal 
compared to neutral instructions. 


Nature of Instigation in Doll Play 


One of the presumed virtues of doll 
play is that the amorphousness of the 
stimulus situation would permit wide 
expression of ‘“person’’ variables. 
Consequently, the preoccupations of 
the subject would be the major de- 
terminants of his fantasy responses. 
Recently, the contribution of the 
instigating stimulus itself has re- 
ceived serious attention in projection 
theory. For example, in the TAT 
measurement of need achievement 
the pictures were found to vary in the 
degree to which they elicited achieve- 
ment imagery. 

In doll play, we get the impression 
that the situation may be too broadly 
instigating for the purposes for which 


the technique is often used. Since 
this projective method is used to 
measure a wide variety of child be- 
haviors, it is questionable if it is an 
equally appropriate measuring device 
for all of the variables. The data 
imply quite certainly that doll play is 
a useful device for measuring fantasy 
aggression. Beyond that, the inci- 
dence of other actions which may be 
coordinated to such motivational 
systems as dependency and achieve- 
ment appear to be meagre. In other 
words, the home as the miniature 
situation is associated with so many 
kinds of behavior that the researcher 
cannot be sure that the actions in 
which he is interested will appear with 
sufficient frequency to be useful. 

We can two devices to 
narrow the spectrum of instigation. 
The first is to arrange a doll play set- 
ting which calls forth the specific 
behaviors upon which the study 
focuses. For example, several studies 
have used a school room rather thana 


suggest 


house when the researcher was inter- 
ested in school related behavior 
(Bach, 1945; Melville, 1959) and one 
study used a play yard setting 
(Bremer, 1947) to study play related 
behavior. 

The second procedure focuses doll 
play even more narrowly, and may be 
thought of as analogous to the story 
completion technique. The experi- 
presents the child with a 
problem and then permits the child to 
complete the action when the dolls 
are given to him. lLynn’s recent 
structured doll play test follows this 
procedure; and the studies of preju- 
dice in which the child is asked to 
make a choice of a doll is a second 
example of the focused method. 


menter 


SUMMARY 


This paper ¢urveyed the develop- 
ment and uses of doll play as a re- 
search tool. Besides methodological 
studies the findings in five areas of 
investigation which have used doll 
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play were summarized: aggression, 
stereotypy, doll preference, effect of 
separation from parents, and preju- 
dice. Although certain groups of 


studies yield interrelated results, the 
use of this research tool has been so 
varied that the overall impression is 


of many disparate findings, in spite 
of the basic similarity in method. It 
is suggested that a conceptual difh- 
culty underlying the studies has been 
the lack of distinction between wish 
fulfilling and replicative fantasies in 
children. 
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TO SCHIZOPHRENIA 
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There is a growing number of social 
and biological scientists who feel the 
need for a comprehensive theory of 
behavior—a theory of which schizo- 
phrenia in particular, or psycho- 
pathology in general, is only one 
facet. The theory should be broad 
enough to encompass data from such 
apparently diverse fields as anthro- 
pology, phylogenesis, human develop- 
ment, and states of lowered conscious- 
ness. Data from all of these areas 
contribute to our understanding of 
human behavior, and it would seem 
that the law of parsimony would be 
better served if these data could be 
subsumed under the same concepts 
and interpreted in terms of a com- 
mon set of principles. 

This paper attempts to outline a 
comparative-developmental approach 
to schizophrenia. It is comparative 
in that it relates data from the study 
of schizophrenia to many different 
fields of inquiry. It is developmental 
insofar as it is suggested by, and 
draws its basic facts from develop- 
mental studies—the development 
from conception to birth, the develop- 
ment from childhood to adulthood, 
the development from the single- 
celled organisms to man, and from 
developmental studies of human cul- 
tures. 

For the particular organization of 
the approach to schizophrenia pre- 
sented here, the author accepts re- 
sponsibility; the original formulation 
of the comprehensive comparative- 
developmental theory is that by 

1 Presently with National Analysts, In- 
corporated, 1015 Chestnut Street, Phila- 
delphia, Peinsylvania. 


Heinz Werner (1940) and 
workers at Clark University. 

Werner’s comparative - develop- 
mental approach aims at viewing the 
total all organisms in 
terms of a common set of develop- 
mental principles. It is his belief that 
such an approach is fruitful in co- 
ordinating, within a single descriptive 
framework, psychological phenomena 
observed in phylogenesis, ontogenesis, 
ethnopsychology, and psychopathol- 
This paper itself to 
what this theoretical position has had 
to contribute to an understanding of 
schizophrenia. It attempts to indi- 
cate the comprehensiveness and heu- 
ristic value of the approach without, 
however, attempting to present an 
exhaustive review of the large body of 


his co- 


behavior of 


ogy. confines 


relevant research. 

Behavior proceeds through given 
A formal 
similarity obtains between the organ- 


stages in its development. 


ization and structure of processes in 
young children, in organisms low on 
the phylogenetic scale, in human 
adults of technologically backward 
societies, and in 
low ered 


states of 
educated 
normal adults of technologically ad- 


certain 
consciousness in 
vanced societies. In order for develop- 
mental theory to encompass schizo- 
phrenic the 
introduction of which 


processes it requires 


constructs 


suggest a parallelism of various as- 


pects of schizophrenia with develop- 
mental patterns in all of these spheres 
of inquiry, but especially with de- 
velopment in childhood. To this end 
developmental theorists have intro- 
4 ae 4 ” 
duced the concept of “regression. 


The progression seen in the normal 
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course of development is reversed in 
pathology; thus, in schizophrenia we 
may expect to find a regression in the 
direction of greater primitivization of 
process. 

A frequently raised objection to 
developmental theory is that it seeks 
only generic similarities between 
various groups and tends to ignore 
their differences. 

Exploration of developmental the- 
ory does require seeking for system- 
atic patterns of generic similarities 
in cognitive performance among cer- 
tain groups. Thus focused on simi- 
larities, developmental theorists have 
not always taken explicit account of 
specific differences that have ap- 
peared between groups. 

The heuristic value of such an 
approach has already been demon- 
strated by the considerable number 
of investigations that have been pro- 
voked by or conducted under the 
purview of development theory. Its 


clinical value is suggested by its 


contributions to psychodiagnostic 
testing, in particular to the scoring 
and interpretation of the Rorschach 
technique. Genetic theory does not 
question that differences exist be- 
tween the child and adult schizo- 
phrenic. It does hold that similarities 
in cognitive structure exist between 
young children and adult schizo- 
phrenics both of which are exemplifi- 
cations of an ideal construct, namely, 
developmental primitivity. 

A word now about the use of the 
term ‘“‘primitive’” (Werner & Kaplan, 
1956). Much of the criticism leveled 
at the use of this term is based on the 
assertion that it is moralistic in char- 
acter and thus has little place in 
scientific endeavor. No such evalua- 
tive connotation is intended. While 
“‘primitivity” is not evaluative in this 
moralistic sense, it is evaluative in 
that it may either impede or facilitate 
attainment of certain goals or states. 
Primitivity pertains to the psycho- 
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logically prior stages of development. 
In essence the concept of primitivity 
is a theoretical construct referring to 
a kind of cognition characterized 
by developmentally early processes. 
Processes that appear early in the 
development sequence—that is, early 
in childhood, or early in the temporal 
development of an idea—are more 
primitive than those which appear 
later in the sequence. 

The term ‘‘regression’’ as used by 
Werner (1940) refers to the structural 
re-emergence of developmentally 
lower levels of functioning as the 
more advanced and more recently 
developed levels are disorganized. 
Regression in this sense differs in 
emphasis from the meaning given 
this term by psychoanalytic ortho- 
doxy? which focuses on impulses and 
the methods by which these are 
gratified and controlled. While psy- 
choanalysis has emphasized the func- 
tion and content of psychopathology, 
the developmental approach con- 
siders only the formal structure of psy- 
chopathological processes. 

By similarity in process between 
childhood and pathological primitivi- 
zation reference is made to structural 
similarity, not to similarity in con- 
tent. The regressed adult is, of course 
not a child; rather, similar organiza- 
tions or forms of process are identifi- 
able in both. Our interest here is not 
primarily in what children or schizo- 
phrenics think or perceive, but rather, 
how they think or perceive. Schizo- 
phrenia thus is seen as a regression in 
cognitive that is, it is 
conceived as a reversal of those pat- 
terns of thinking, perceiving, and so 
on, which are encountered in the nor- 
mal course of development. Further, 
developmental theorists are not con- 


processes; 


2 Although Freud considered ego regression 
as well as impulse regression, many psycho- 
analytic practitioners are inclined to over- 
emphasize the latter at the expense of the 
former. 
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cerned with the nature of the condi- 
tions that have caused the regressed 
behavior or the historical antecedents 
of such conditions. Rather they 
focus on the structural or formal 
consequences of these predisposing 
experiences. 

It should be made clear that the 
psychoanalytic and the comparative- 
developmental approaches are not 
mutually exclusive; rather, they focus 
on different aspects of schizophrenia 
(Arieti, 1955). Each may be clinically 
useful and theoretically productive. 
Devoting attention in this paper to 
the structural point of view does not 
attribute less value or validity to the 
psychodynamic viewpoint. Where 
the psychodynamic approach is par- 
ticularly helpful in therapy, the 
structural approach is useful in de- 
veloping hypotheses, describing de- 
velopmental phenomena within a 
consistent framework, and—most im- 
portant to the clinician—it provides 
a gauge by which psychopathological 
states and modifications in those 
states may be assessed and under- 
stood in terms of developmental 
criteria (Siegel, 1953). The concept 
of schizophrenia which is proposed 
here proceeds from a basic develop- 
mental principle; wherever develop- 
ment takes place it initiates in a 
globality or lack of differentiation 
and becomes increasingly more differ- 
entiated, terminating in a state of 
integration. The development of 
motor coordination may serve to 


illustrate this developmental prin- 
ciple. 
When 
typically reacts with 
rected motor activity. In the normal 
course of maturation, this mass action 


the newborn 
mass nondi- 


stimulated, 


becomes more focalized and better 
directed with respect to the stimulat- 
ing agent. That is, from the total 
involvement of the whole body 
emerges a differentiated activity of 
certain parts of the body—arms, 


legs, head, etc. These now differenti- 
ated movements become integrated 
into a single smooth-flowing response 
in which all parts of the body may 
participate appropriately in achiev- 
ing a goal or solving a task. 

Now let us turn to the separate 
functions that this approach encom- 
passes. In each case the comparison 
will be made between human onto- 
genesis and schizophrenia. 


EMOTIONAL BEHAVIOR 


Ontogenetic changes in emotional 
behavior proceed along, at least, 
three continua: (a) From. overt 
motor expression of emotion to in- 
creasingly more internalized experi- 
ence of emotion. Crying (Bayley, 
1932), and other motor activity de- 
creases with age. (6) From globality 
of emotional experience to greater 
differentiation (Bridges, 1932). At 
first there are only undifferentiated 
affective states of relative excitement 
or quiescence. With development 
there is greater specificity of emotion. 
For example, global negative affect 
becomes more differentiated into in- 
creasingly more subtle nuances, such 
as hate, despise, contempt, dislike, 
etc. (c) From lability of emotional 
experience to increased stability. In 
the young child there is character- 
istically momentary change in the 
nature of his emotional experiences 
and its expression (Jersild, 1939). 
What starts out as a laugh may end 
up in bitter tears or vice versa. Cry- 
ing can be quickly changed to gig- 
gling by a well intentioned and well 
placed tickle. 

In accordance with the regression 
hypothesis, in schizophrenia there is 
the expectation of a reversal in each 
of these three progressions: 

1. In the acute stage of the illness, 
before chronicity becomes manifest 

*A comprehensive survey of develop- 


mentally oriented research in childhood may 
be found in Werner (1946). 
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in affective blunting, emotion is un- 
controlled; impulse expressed 
overtly without adequate intellec- 
tual intervention. Not only is the 
expression of affect likely to be more 
public, but there is an increase in the 
degree of motor involvement. Thus, 
the motoric hyperactivity of the ex- 
cited schizophrenic and the motoric 
hypoactivity of the chronic “burnt- 
out” schizophrenic both exhibit the 
degree to which the emotional state is 
syncretically (Werner, 1940) fused in 
its expression with the motoric sys- 
tem. Although the affective and 
motoric are never wholly independent 
(Wolff, 1943) of each other, the 
immediacy, directness, and overtness 
of this relationship tends to increase 
in schizophrenia. 

2. The increasing differentiation 
and subtlety of feelings seen in onto- 
genesis is reversed in schizophrenia. 
Clinical practice, in particular ex- 
perience with the projective tech- 
niques, reflects the dedifferentiation 
of feelings. Aggressive and sexual 
components are not infrequently 
fused into an indistinguishable whole. 
Even more striking is the blatant 
admixture of positive and negative 
impulses. 

3. Though perhaps not to the 
same degree, the emotional experi- 
ence of the acute schizophrenic is 
similar to that of the young child’s in 
that it, too, is highly labile and un- 
predictable. 


is 


PERCEPTION 


The progression from globality to 
differentiation to integration is per- 
haps best seen in perception. For the 
neonate and very young child the vis- 
ual field is not well organized or struc- 
tured. Figure and ground, contours, 
patterns of light and shadow, move- 
ment, all merge into an undifferenti- 
ated perceptual mass, or in William 
James’ classic terminology, ‘‘a bloom- 
ing, buzzing confusion.’”” From this 
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globality emerges stages of increas- 
ingly differentiated perception. Here 
visual patterns acquire object-prop- 
erties, with definitive contours and 
localized in three-dimensional space. 
This development then terminates in 
a stage in which these differentiated 
aspects of the perceptual field are in- 
tegrated, or synthesized, into a single 
meaningful percept (Werner, 1940). 

This developmental sequence has 
been corroborated by a number of 
experiments, the most convincing of 
which have used the Rorschach 
blots stimulus material (Hem- 
mendinger, 1953). Use of this tech- 
nique reveals the following changes 
to take place with increasing age. 

Three-year-olds whole-per- 
ceivers; they see few details and their 
perception is best described qualita- 
tively in terms of their undifferenti- 
ated character. Four- and 5-year- 
olds react less in terms of wholes and 
more often notice and comment on 
the parts. At 6 years another, and 
distinct, change occurs: an abrupt 
and marked increase in perceptual 
responses to the small and rarely 
noticed areas in the blots. This at- 
traction to tiny details is interpreted 
as an intensification of the develop- 
ment of differentiation. At 9 years 
begins the final phase of perceptual 
development—that of synthesis and 
integration. This final phase termi- 
nates in the appearance of predomi- 
nantly synthesizing activity. In the 
integrated whole response, the blot is 
perceptually articulated and then re- 
integrated into a well differentiated 
unified whole. 

Having considered perceptual de- 
velopment in children we would ex- 
pect, according to the regression hy- 
pothesis, a reversal of this pattern in 
schizophrenia. Further, we would 
expect that the greater the pathology 
the more immature the perception. 

Experiments, particularly those by 
Friedman (1953) and Siegel (1953), 


as 


are 
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reveal the following relationships in 
perceptual function between schizo- 
phrenics and children: 

With respect to the develop- 
mentally immature response, there 
exists no significant difference be- 
tween children and schizophrenics, 
and both groups differ significantly 
from normal adults. The same is true 
of the most advanced percepts. The 
integrated whole response discrimi- 
nates each of the three groups from 
each other. Thus, these findings 
justify the conclusion that schizo- 
phrenics, in some respects, respond 
perceptually in a manner similar to 
that of children, and in other aspects, 
they occupy an intermediate position 
between normal adults and children. 
This may be understood in terms of 
the hypothetical construct of regres- 
sion. In this regard regression seems 
evident, but it is not of such a total 
nature as to completely eradicate the 
history of the individual who has 
once operated on a higher develop- 
mental level. 

Now, what may be said regarding 
the schizophrenic subtypes? There is 
little or no evidence on which to dis- 
criminate the perceptual functioning 
of the hebrephrenics and catatonics 
from each other, and no work has 
been done with simple schizophrenics. 
However, developmentally compar- 
ing paranoid schizophrenics with the 
combined hebrephrenic and catatonic 
group (Siegel, 1953) we find the fol- 
lowing: while the perception of para- 
noid schizophrenics is typically frac- 
tionated and fragmented with em- 
phasis on perceptual analysis, re- 
sembling the performance of children 
from 6 to 10, that of the hebrephrenic 
and catatonic schizophrenics is char- 
acteristic of the global, amorphous 
perceptual activity of 3-5 year old 
children. 

Comparative-developmental theory 
thus permits the location of cata- 
tonics, hebrephrenics, and paranoids 


on a developmental scale. In all 
aspects of cognitive functioning, in 
addition to perception, paranoid 
schizophrenics are expected to per- 
form more like the normal adult than 
the catatonic or hebrephrenic schizo- 
phrenic. It does not, however, at- 
tempt to state the conditions which 
facilitate or inhibit the depth of re- 
gression in diagnostic cate- 
At this stage in its develop- 
ment the theory has paid relatively 
little attention to motivational 
pects of schizophrenia. 


these 


gories. 


as- 
Among clin- 
ical practitioners this conceptual vac- 
uum filled by 
namic theories. 


has been psychody- 

There are other aspects of percep- 
tual development and regression that 
are instructive here: 

The extreme lability that we see in 
primitive emotional behavior is also 
seen in the perceptual sphere. Those 
who have worked intensively with 
schizophrenics or with young children 
cannot avoid being impressed by the 
extreme lability of their attention. 
This, in both the child and in the 
schizophrenic, may be attributable to 


a kind of perceptual passivity in 


which competing stimuli have equal 


potential for evoking 


This 


a perceptual 
notion of stimuli 
equipotentiality may be useful in 
understanding the stimulus 
boundedness of the child and schizo- 
phrenic. 

The child is stimulus bound in that 
the stimulus must be attended to. An 
infant’s eyes must follow the hand 
that goes before it. His hand must 
grasp the object that is placed in it. 

The schizophrenic is similarly stim- 
ulus bound. Stimuli that compete 
for a perceptual response cannot be 
adequately discriminated in terms of 
their relevance to a task. Thus, the 
schizophrenic complains of a rapidly 
shifting, kaleidoscopic world. A pa- 
tient seen by the author complained 
continually that he could not attend 


response. 


severe 


. 
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to anything for very long because 
everything and anything disrupted 
his thoughts. Apparently irrele- 
vant details demanded his atten- 
tion: a noise outside, lights passing 
by at night, an apparently random 
thought, or a bodily sensation. had 
equal demand on his attention as the 
topic being discussed or the task at 
hand. This extreme interpenetration 
of the schizophrenic’s attention and 
thought by apparently random stim- 
uli is a well known phenomenon and 
has been well described by Cameron 
(1939), Kasanin (1944), and others. 


LEARNING 


The developmental approach to 
learning derives from the notion 
that development is characterized by 
qualitatively different processes and 
modes of organization, rather than 
by simply quantitative variations in 
process. This approach is therefore 
in opposition to those theoretical ori- 


entations which view learning as re- 


duceable to a single process. Devel- 
opmental theory does not conceive of 
any one process as being paradig- 
matic of the whole range of human 
learning. A view which reduces all 
learning to a single process conceives 
of the adult as having available more 
response alternatives than the child. 
A genetic point of view conceives of 
the adult and child as utilizing dif- 
ferent processes which may not be 
distinguishable in terms of efficiency 
or achievement. 

Developmental theorists thus seek 
to understand the nature of human 
learning through the exploration of 
qualitatively distinct organizational 
stages. Such an exploration was un- 
dertaken in a recent study by Gold- 
man and Denny (in press). They 
presented two kinds of learning tasks 
to children 5-14 years old. Perform- 
ance in the first learning task de- 
pended on apprehending the regular 
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pattern of the pre-established pro- 
gram (response to two switches in a 
right-right-left-left sequence). Per- 
formance in this task increased 
steadily with age and IQ. In the sec- 
ond task rewards were received ac- 
cording to a predetermined, random 
“probability” program in which one 
response was rewarded 25% of the 
time and the other response was re- 
warded 75%. Performance in this 
task was essentially invariant with 
age and IQ with the trend somewhat 
favoring the younger children. Inso- 
far as these developmental curves 
were strikingly different they were in- 
terpreted as indicating that the per- 
formances on the two learning tasks 
reflected different processes. Insofar 
as the sequential, or “recursive,” 
task required an active seeking for a 
general rule for its solution, it was 
interpreted as requiring a more ad- 
vanced mode of functioning than 
that on the probability or ‘‘stochas- 
tic’’ task which permitted a more pas- 
sive orientation to the task in that it 
did not provide for such an easily 
generalizable solution. 

A third learning process that may 
represent the most primitive level for 
humans is classical conditioning, in 
which the stimulus presented 
wholly at the discretion of the experi- 
menter and the response is usually of 
a physiological or reflexive nature. 
Developmental studies of classical 
conditioning suggest that conditioned 
responses can be established very 
early in life and indeed that young 
children can be more easily condi- 
tioned than older children and 
adults (Jones, 1928, 1930a, 1930b; 
Kasatkin & Levikova, 1935; Ma- 
teer, 1918; Razran, 1933, 1935). The 
developmental primitivity of clas- 
sical conditioning is further suggested 
by studies which indicate that sus- 
ceptibility to conditioning is en- 
hanced in states of lowered conscious- 


is 
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ness (Leuba, 1940, 1941; Scott, 1930). 

Thus, at least three modes of learn- 
ing are suggested which, in the order 
from most primitive to most ad- 
vanced, are: learning by classical con- 
ditioning, stochastic learning (instru- 
mental conditioning), and recursive 
learning (problem solving). “The first 
level appears to be characteristic of 
the learning of very young children 
and of infrahuman animals. Here the 
learner is a kind of passive “victim” 
of his environment in that he does 
little of an active nature to learn; 
learning, the pairing of stimuli and 
response, is imposed upon him.‘ The 
second learning mode is distinguished 
from the first in that the learner is 
active or “‘instrumental”’ in the learn- 
ing process, yet the learning process 
is essentially by rote. In this learn- 
ing mode young children and adults 
do equally well, as do subjects of 
varying intelligence. The third learn- 
ing mode is not only the most active 


in that there is a deliberate seeking 
for order and regularity, but there is 
a vigorous development and testing 


of solution hypotheses. This learn- 
ing mode favors older and more in- 
telligent subjects. 

With phylogenetic and 
ontogenetic—classical conditioning is 
less adaptive and recedes to the back- 
ground until called upon when the 
task situation calls for no more pro- 
found level of intellection. The other 
modes of learning emerge later to bet- 
ter serve the individual’s needs. 

In schizophrenia it is proposed that 
this development is reversed, with 
sequential learning and other forms 
of complex learning situations being 
effected most and classical condition- 
ing ascending in relative importance. 

Schizophrenics have been found to 
be more readily conditioned than 


growth 


A similar viewpoint was expressed by 


Gesell (1938). 


normals in relatively simple situa- 
tions in which the response alterna- 
tives are limited and the response re- 
flexive. This has been demonstrated 
for the knee jerk (Pfaffman & 
Schlossberg, 1936), the psychogal- 
vanic response (Mays, 1934; Ship- 
ley, 1934), and eyeblink (Spence & 
Taylor, 1953). Schizophrenics have 
also been shown to exceed neurotics 
in eyeblink conditioning (Taylor & 
Spence, 1954). However, since some 
studies have failed to demonstrate 
the greater conditionability of schizo- 
phrenics over normals (Howe, 1958; 
Paintal, 1951), the question is raised 
as to what stimulus conditions en- 
hance the establishment of the con- 
ditioned response in schizophrenics 
as compared to normals. 

In accordance with the regression 
hypothesis, the increase in suscepti- 
bility to conditioning in schizo- 
phrenia should be accompanied by a 
decrement in performance of com- 
plex tasks. By “complex’”’ task is 
meant tasks which permit wide re- 
sponse alternatives, among which 
are many irrelevant ones, and in 
which an active role of the learner is 
required. Schizophrenics have been 
found to perform poorly relative to 
the performance of control normals 
in these complex tasks (Cameron, 
1939; Hanfmann, 1939; Hanfmann & 
Kasanin, 1942; Rapaport, 1945). 

The conditioning per- 
formance and the decreased perform- 
ance in complex tasks, in schizo- 
phrenia as compared to normals, has 
been interpreted by Mednick (1958) 
and other learning oriented theorists 
(e.g., Taylor & Spence, 1954) in 
terms of the effect of drive intensifi- 
cation (anxiety) on the response 
strength of the conditioned response. 
A difficulty with this type of Hullian 
interpretation is that it fails to take 
into account developmental data. 
The superior performance of children 


increased 
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and infrahuman animals relative to 
normal adults in conditioning experi- 
ments can hardly be incorporated 
within such a theoretical framework 
unless one postulates the existence of 
a heightened drive state in these more 
primitive organisms. Genetic theory 
offers the parsimonious incorporation 
of data from all of these areas within 
a single theoretical structure. 

When a stable stimulus-response 
relationship has been established the 
response may be elicited by other 
stimuli similar in some manner to 
the initial stimulus. 
generalization. 

The genetic principle that differ- 
entiation proceeds from an_ initial 
stage of globality would suggest that 
in development stimulus generaliza- 
tion would decrease. Reiss (1946) 
found that young children tend to 
generalize readily to homophones but 
this tendency disappears at about 11 
years of age. Mednick and Lehtinen 
(1957) found that amount of stimulus 
generalization reactivity, measured 
along a visual-spatial dimension of 
similarity, was significantly greater 
for younger children (7-9 years) 
than for older children (10-12 years) 

The expectation then would be 
that in schizophrenia stimulus gen- 
eralization would be higher than in 
normals of comparable intelligence. 
A number of studies testify that this 
is so (Cameron, 1938; 
1952; Mednick, 1955). 


This is stimulus 


Garmezy, 


THINKING AND LANGUAGE 


Thinking and language may be in- 
vestigated from the vantage of many 


dimensions. Three which appear to 
the author to be most central and in- 
clusive are the development from 
idiosyncrasy to consensuality of con- 
cepts, from lability to stability of 
concepts, and from contextualization 
to autonomy of concepts. 

The development from idiosyn- 
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crasy to consensuality refers to the 
increasingly more public and predict- 
able thinking of which the child be- 
comes capable as he grows older 
(Pollack, 1953; Werner & Kaplan, 
1952). Thus, the agreement in the 
meaning of words among members of 
a given speech community increases 
with age. Children, in contrast to 
adults, use words in a private, highly 
individualistic manner (Hayakawa, 
1954). 

In psychopathological regression 
the development toward greater con- 
sensuality in thinking is reversed. 
Idiosyncratic thought then reduces 
the schizophrenic to virtual social 
isolation (Cameron, 1938; Goldman, 
1960). 

The second dimension is the devel- 
opment from lability to stability of 
concepts. In the young child con- 
cepts are typically labile (Pollack, 
1953). The nature of the concept 
changes rapidly and in a seemingly 
capricious manner (Eng, 1931). 

An example from performance on 
the Object Sorting Test (Rapaport, 
1945) may serve to illustrate concept 
lability. The test consists of a num- 
ber of everyday, common objects 
that are placed on a desk before the 
subject. The typical adult, when 
asked to place these objects into 
meaningful groups so that the ob- 
jects within any one group belong to- 
gether, will form objects into groups 
according to their color, or material, 
or perhaps their use. A subject may 
pick out all red objects and put them 
together, or all wooden objects, or all 
tools. Young children will frequently 
switch the relationship in a very la- 
bile manner (Reichard, Schneider, & 
Rapaport, 1944). Thus, a young 
child will select first a red ball and 
then this is placed with a red plate, 
the two objects having redness in 
common. Then a toy knife is se- 
lected because it goes on the table, 
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too, like the red plate, and then 
pliers are chosen because it is metal 
like the knife, and then a pipe be- 
cause “‘the workman uses the pliers 
and smokes a pipe.” 

Similar chain concepts are devel- 
oped by schizophrenics in the same 
task situations. ‘The response of a 
young schizophrenic girl in a task in- 
volving a linear schematization tech- 
nique may serve as an illustration of 
the extreme equivocality, or lability, 
of the relationship between the sym- 
bol and the meaning it symbolizes 
(Goldman, 1960). Linear schematiza- 
tion requires the subject to represent 
a word, in this case a mood term, by 
drawing a line. The subject is asked 
to draw an ‘“‘angry”’ line, or a line that 
expresses the word “misery,’”’ and 
so on. ‘This subject was asked to 
draw a line that represented the 
word “‘healthy.’’ She drew a series of 
different lines. When asked what 
there was in the lines she drew that 


suggested health she responded: “A 
seven upside down, lightning going 
up, the medusa, and this is the med- 
ical sign of health.’’ While the pa- 
tient could not clarify the way in 
which all of these concepts are re- 
lated to health, the response invites 


speculation about the way each 
thought was related to the one that 
preceded it. While the experiment 
was in progress she 
7-Up and remarked that it 
“good for you.” Lightning going up 
may represent a denial of the destruc- 
tive (i.e., unhealthy) effects of light- 
ning. The medusa may be related to 
“the medical sign of health’ (the 
caduceus) by clang association, or by 
the snakes which are common to 
both. 

In the extreme case concept la- 
bility may be reflected in one word or 
symbol subsuming not only different 
concepts but opposite ones. This has 
been established in dreams (Jones, 


was drinking 
was 


1913), in archaic language (Freud, 
1950), and also in schizophrenia 
(Goldman, 1960). 

The equivocal nature of symbol 
meaning in childhood and in schizo- 
phrenia appears to be determined by 
the close bond between the symbol 
and some particular situation, event, 
or person with which it is associated. 
This is the third dimension—the de- 
velopment from contextualization to 
autonomy of a concept. Concepts in 
childhood are determined by per- 
sonally relevant experience (Binet, 
1916; Chodorkoff, 1952; Feifel, 1949; 
Hayakawa, 1954; 1944; 
1916). A newspaper, for 
example, may be ‘what 
the paper boy brings and you wrap 
the garbage with it’? (Hayakawa, 
1954, p. 80). With growth these con- 
cepts become increasingly independ- 
ent or these per- 
sonally meaningful contexts (Werner, 
1940; Werner & Kaplan, 1950, 1952). 

In schizophrenia we expect the re- 
verse of this development: concepts 
should become increasingly less au- 
more contextualized. 
There is extensive evidence—clin- 
ical and experimental (Arieti, 1948; 
Baker, 1953; Cameron, 1938; Gold- 
man, 1960; Kasanin, 1944) that this 


1S SO. 


Kasanin, 
Terman, 


defined as 


autonomous of 


tonomous and 


The vocabulary test perform- 
lend further the 
statement that in comparison to nor- 
mals, tend to 
words in terms of their concrete func- 
tions rather than in terms of abstract 
autonomous properties (Chodorkoff, 
1952; Feifel, 1949; Harrington, 1954; 
Yacorzyncki, 1941). 

This regression may be illustrated 
by referring again to linear schemati- 
zation. 


ances credence to 


schizophrenics use 


A group of schizophrenics 
were asked to represent the meaning 
of a word ina line. Then inquiry was 
made into the relationship between 
the line and the word it expressed. 


Typically, the line was justified in 
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terms of some personally relevant 
experience. For example, the word 
“gentle’’ was represented by a pa- 
tient as a hay stack when she replied 
to the inquiry with “lying in the hay 
is gentle.” Another patient drew two 
lines which she said represented the 
path taken by the hand of a mother 
“gently” caressing a child. Still a 
third patient represented the word 
“gentle” with a leaf, which ‘‘is 
‘gently’ blowing in the breeze.’ 
Gentleness in all of these cases is rep- 
resented by unique personal experi- 
ences and associations. Similarly, in 
the Object Sorting Test, schizo- 
phrenics are more inclined than nor- 
mals, to relate objects in a highly per- 
sonal manner—‘‘All of these things 
were in my mother’s house”’ or “I 
think they are all pretty.” 

Thus, three dimensions of concepts 
are suggested. Underlying the first, 


idiosyncrasy-consensuality, is the in- 
creasing stability of concepts. A con- 


cept must be stable in reference be- 
fore it can be public, or consensual. 
Underlying, in turn, the second di- 
mension, is the contextuality-auton- 
omy dimension. If a concept has 
meaning only in terms of personal 
contexts, its reference will be as 
labile as one’s personal experiences, 
and therefore not available for use as 
a vehicle for social interaction. 

The second and third dimensions 
both reflect the developmental prog- 
ress from globality to differentiation, 
and its dedifferentiation in psycho- 
pathological regression. ‘To the ex- 
tent that a concept is labile, or in the 
extreme, in that it encompasses op- 
posite meanings, it is undifferenti- 
ated. In schizophrenia the vehicles 
of thinking and communication be- 
come progressively dedifferentiated 
in that they, the symbol and refer- 
rent, are not related in a stable man- 
ner. With regard to contextualiza- 
tion it may be said that the more au- 
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tonomous a meaning, the more it is 
differentiated from a particular con- 
text. Thus, in development there is 
progressive meaning-context differ- 
entiation, while in schizophrenia 
meaning and context are dedifferen- 
tiated. 

Normal subjects more frequently 
reflect less situational meanings and 
attempt to represent some essential 
quality of gentleness. The word 
“gentle” is typically symbolized by 
normals by a light curved line, ex- 
pressing the “soft,” “light’’ aspects 
of ‘“‘gentle.’’ The autonomous mean- 
ing of a word is essential in that it 
abstracts from each of the many sit- 
uations with which it is associated 
(lying in hay, mother caressing child, 
etc.), a commonity that each shares. 
The essential meaning of a concept is 
abstracted from but is relatively au- 
tonomous of concrete contexts. 


SOCIALIZATION 


In the development of social be- 
havior we again see the increasing 
differentiation out of the state of 
globality which terminates in inte- 
gration. We have little reason to be- 
lieve that in the neonate the self is 
distinguished from others. Accord- 
ing to psychoanalytic theorists the 
mother, her breast, her voice, the 
warmth of her body, the sensations 
from within the infant’s own body, 
are an indistinguishable whole. With 
development, there is an increasing 
awareness of the self as an entity. 

The development in social integra- 
tion is seen in patterns of play 
(Buehler, 1935; Loomis, 1931). At 
first, young children play in isolation 
with their hands, feet, or other ob- 
jects. Later, children prefer to play 
in the presence of other children— 
not with other children, but in ‘‘paral- 
lel’ play. Differentiation has taken 
place, with this first step toward inte- 
gration and will eventually lead to 
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genuine interpersonal interaction. 

This development toward social in- 
tegration is also seen in the increas- 
ing complexity of the social groups, 
and in their increasing stability 
(Zaluzhni, 1930). 

In schizophrenia we find similar 
processes, except in reverse. On the 
ward we can see interaction repre- 
senting all of these phases. The sus- 
picious, hostile paranoid that still 
seeks social interaction; the hal- 
lucinating, babbling chronic schizo- 
phrenic that somehow still prefers to 
hallucinate and babble in the pres- 
ence of others, although not with or 
in concert with others; and finally, 
the totally regressed isolate who with- 
draws into the social vacuum of a 
corner of the ward and devotes him- 
self to his own bodily sensations. 


Motor FUNCTIONS 


One of the most striking develop- 
ments to take place in the motor 


sphere is the increase in the implicit- 


ness of motor activity. Vicarious 
movements replace overt activity in 
reasoning, problem solving is 
vocal and more silent, motion in gen- 
eral is less gross. 

Relative to the massive debilita- 
tion in other spheres there is rela- 
tively little motor involvement in 
schizophrenia. It is only in the most 
severe regression that motor impair- 
ment is found, such as in catatonic 
cerea flexibilitas, and in the hyperac- 
tivity and that 
times characterizes the acute stage of 
schizophrenia. In chronic schizo- 
phrenia, too, there is frequently evi- 
dence of incessant repetitive move- 
ments of head, trunk, or limbs. 

The fact that there is little motor 
involvement in schizophrenia, except 
in severe is consistent with 
Hughlings Jackson’s principle that 
those functions which are the latest 
to develop are the first to be impaired 


less 


restlessness some- 


cases, 1s 


in pathology. Since motor functions 
are amongst the first to develop in in- 
fancy, we would therefore expect im- 
pairment in this sphere to develop last. 

There are other dimensions that 
have not been considered. In each of 
those that have been discussed focus 
has been on structural similarities be- 
tween young children and _ schizo- 
phrenic functioning. Such similari- 
ties in process are also distinguish- 
able in primitive cultures and in 
states of lowered consciousness, such 
as dreams, drug states, and hypno- 
gogic conditions. 

A comparative-genetic approach is 
fruitful in our effort to understand 
the essential nature of schizophrenia 
because it seeks to expose process 
rather than assess achievement and 
it is an approach in which structure 
is no less important than content and 
function. 

Although a structural point of 
view has been central in the systems 
of some theorists for some time 
(Arieti, 1957; Munroe, 1955; Rapa- 
port, 195la, 1951b), psychoana- 
lytic orthodoxy has not given suffi- 
cient attention to structural elements 
until recently. Having concerned it- 
self in its early development predomi- 
nantly with primary process, psycho- 
analysis is now turning increasingly 
more to a consideration of secondary 
Merton Gill (1959) has for- 
malized this emphasis of the struc- 


process. 


tural point of view in psychoanalysis. 
This more energetic psychoanalytic 
consideration of ego functions, and 
the approach that has 
been offered in this paper have a 
similar goal—the formulation of a 
comprehensive theory of human be- 
havior. Such genetic approaches re- 
mind us that in our consideration of 
the schizophrenic, oral deprivation is 
a no more significant datum than is 
the inability to conceive of square 
things in terms of their squareness. 


theoretical 
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THE RELIABILITY 


OF A RESPONSE MEASURE: 


DIFFERENTIAL RECOGNITION-THRESHOLD SCORES 


DONN BYRNE 
University of Texas 


AND 


The problem of reliability of meas- 
urement is a familiar one in the con- 
text of test construction and test 
evaluation. In other types of investi- 
gation, however, responses frequently 
are measured in a variety of ways by 
a variety of scoring procedures (often 
a priori ones) without evident con- 
cern about measurement. Although 
psychometric issues appear to be for- 
eign to experimental methodology, 
any specified set of stimuli may be 
conceptualized as a test and the 
quantification of subjects’ responses 
as test scores. Viewed in this way, 
such scores should be evaluated ac- 
cording to accepted standards for 
psychological tests (American Psy- 
chological Association, 1954). 

An investigation of the validity of 
a response measure is usually implied 
in the design of an experiment; relia- 
bility, however, is often ignored. 
Whether results are positive or nega- 
tive with respect to one’s hypothe- 
ses, reliability of measurement can 
assume great importance. Psycholo- 
gists have the disconcerting tendency 
to create a new methodology for each 
experiment. In work on perceptual 
defense, for example, it is difficult to 
find any two studies in which the 
same stimulus is presented in the 
same way to evoke responses which 
are quantified in the same manner. 
It seems reasonable to hypothesize 
that some of these stimuli presented 
to subjects in a particular way are 
going to yield more reliably measured 
response dimensions than others. 
With a heterogeneous methodology 
and unknown reliability coefficients, 
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it should not be surprising to find 
some degree of inconsistency across 
experiments not attributable to theo- 
retical weaknesses. ‘Thus, generally 
positive results mixed with some neg- 
ative results may reflect differentially 
reliable ‘‘tests.’” Even the positive 
results are no guarantee that reliable 
measures have been employed. Mc- 
Nemar (1960) suggests several fac- 
tors which act to make published re- 
sults more likely to involve a false 
rejection of the null hypothesis than 
the .05 level of significance would 
In addition, it would seem 
logical to construct reliable measur- 
ing techniques as a preliminary step 
in experimental work rather than as 
an afterthought. 


suggest. 


DIFFERENTIAL RECOGNITION 
THRESHOLDS 


As an example of inadequate meas- 


urement 


techniques, some of the 
‘new look’’ experiments in percep- 
tion of the past decade will be briefly 
reviewed. In studies of perceptual de- 
fense, differential recognition thresh- 
olds for emotionally toned vs. neu- 
tral stimulus material have fre- 
quently served as the dependent var- 
iable and as a measure of individual 
differences in defensiveness. 

All four types of reliability should 
be considered in utilizing a dif- 
ferential recognition-threshold score. 
First, if any subjectivity is involved 
in the scoring process, there should 
be some determination of the extent 
to which independent judges are able 
to arrive at approximately identical 
scores. Interscorer agreement is a 
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necessary, but not sufficient, condi- 
tion for reliability of measurement. 
Unfortunately, many investigators 
determine only the reliability of the 
scoring procedure rather than of the 
scores themselves. Second, if a series 
of discrete, presumably homogeneous 
respoases are combined to form a 
total score, it is important to deter- 
mine the extent to which this score 
is internally consistent. Third, if the 
score is considered to be indicative of 
an enduring personality character- 
istic, it is essential to know the extent 
to which this score is stable over 
time. Fourth, if a different but theo- 
retically equivalent set of stimuli is 
employed to elicit responses, the 
equivalence of the two sets of scores 
should be determined. 

A review of perceptual defense and 
related studies which have used a dif- 
ferential recognition threshold sug- 
gests that a thorough examination of 
reliability is unusual. As might be an- 


ticipated, the reliability coefficient 
which is most frequently reported is 
that of interscorer consistency, and 
the results are generally quite good 
(Eriksen, 1951a, 1951b; Kogan, 1956; 


Lazarus, Eriksen, & 1951; 
Stein, 1953). Internal consistency is 
less frequently investigated, and the 
reported coefficients range from good 
(Vanderplas & Blake, 1949) to medi- 
ocre (McClelland & Liberman, 1949) 
to unsatisfactory (Eriksen, 1951a, 
1951b). Holtzman and _ Bitter- 
man (1956) reported that perceptual 
thresholds for taboo and _ neutral 
words were unreliable measures; 
therefore, they were eliminated from 
a factor analytic study. An investi- 
gation of the stability of differential 
recognition thresholds over time was 
not reported in any of the studies re- 
viewed. Stein’s (1953) data indicate 
that equivalent forms of the stimulus 
which he used yielded very similar 
results. The majority of the studies 


Fonda, 


using a differential threshold score as 
a variable report no reliability infor- 
mation (Beier & Cowen, 1953; Car- 
penter, Wiener, & Carpenter, 1956; 
Chodorkoff, 1954; Cowen & Obrist, 
1958; Greenbaum, 1956; Kissen, 
Gottesfeld, & Dicks, 1957; Kurland, 
1954; Postman & Brown, 1952; 
Smith, 1954; Spence, 1957; Wiener, 
1955; Zuckerman, 1955). 


AN UNRELIABLE SCORE 


The senior author planned to use 
the differential recognition threshold 
for hostile vs. neutral words pre- 
sented tachistoscopically as a cri- 
terion measure for a new test de- 
signed to measure repressing and sen- 
sitizing defenses. It should be con- 
fessed that the reliability investiga- 
tion was undertaken only when cer- 
tain difficulties were encountered. 

Twenty pairs of hostile and neu- 
tral words each matched for 
length, initial letter, and frequency 
of occurrence in 
according to the 
(1944) count. Hostile words 
were defined as those representing be- 
havior involving the derogation, in- 
jury, or destruction of either animate 
or inanimate objects. 


were 


words 
Thorndike-Lorge 


one million 


word 


Neutral words 
were defined as those which were not 
emotionally toned. A word was as- 
signed to either category on the basis 
of the unanimous agreement of three 
independent judges. The 40 words 
were placed on slides and arranged in 
random order. 

The slides were used with a Key- 
stone Overhead Projector equipped 
with a Flashometer. Following a 
demonstration with a neutral prac- 
tice word, each stimulus word was 
presented at 1/100, 1/75, 1/50, 
1/37.5, 1/25, 1/10, and 1 second. 
After trial, the subject re- 
sponded by writing down his best 
guess as to the word presented. Sub- 
jects were seen in small groups, 


each 
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The threshold for each word con- 
sisted of the first trial on which 
that word was correctly recognized 
Scores ranged from 1 (correct recog- 
nition at the 1/100 presentation) 
through 8 (failure to recognize the 
word on any trial). A subject’s mean 
threshold on the 20 neutral words 
minus his mean threshold on the 20 
hostile words vielded a defense score. 
Presumably, a positive score would 
indicate a sensitizing reaction and a 
negative score a repressing reaction. 

Disappointing results in cross-vali- 
dating the test that was being devel- 
oped led to a belated investigation of 
the reliability of the criterion. Dif- 
ferential thresholds were obtained 
for almost 600 subjects, men and 
women enrolled in general education 
courses at San Francisco State Col- 
lege. From this total, a sample of 50 
was drawn. Because some subjec- 
tivity enters into the determination 
of the trial on which correct recogni- 
tion first occurs, the authors scored 
these protocols independently. The 
defense scores had respectable inter- 
scorer consistency as shown by a cor- 
relation of .91. 

The second type of reliability con- 
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JOAN HOLCOMB 
sidered was internal consistency. 
When differential thresholds are ob- 
tained, it is assumed that there is 
some response homogeneity with re- 
spect to stimulus content. In this 
study, responses to 20 of the pro- 
jected words should have been de- 
termined in part by their common 
reference to hostility, and these re- 
sponses should to some extent differ 
from those evoked by the 20 non- 
hostile words. Therefore, split-half 
reliability was determined by divid- 
ing the hostile words into odd and 
even groups and computing the dif- 
ferential threshold for these 
two groups compared to their match- 
ing neutral words. The coefficient of 
internal consistency was .00. It was 
not deemed essential to apply the 
Spearman-Brown correction formula. 

Thus, independent judges agreed 
about the nature of the stimulus ma- 
terial and about the scoring of the 
subjects’ responses. 


scores 


Nevertheless, 
the resulting scores were unreliable. 
In view of this finding and the con- 
siderations discussed earlier, it is sug- 
gested that, whenever possible, any 
study should include a report of the 


reliability of its response measures. 
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COMMENTS ON “THE PARAMORPHIC REPRESENTATION 
OF CLINICAL JUDGMENT’” 
JOE H. WARD, JR. 
Personnel Laboratory, Wright Air Development Division 


The purpose of this discussion is to 
comment on possible misunderstand- 
ings that may arise from the discus- 
sion of relative weights presented in 
Hoffman’s (1960) paper, ‘‘The Para- 
morphic Representation of Clinical 
Judgment.” First, the present au- 
thor certainly agrees that regression 
techniques can be quite useful in the 
study and analysis of judgments. 
Regression analysis can certainly 
play an important role in the study of 
the homogeneity of judgment policies 
among individuals and in the analysis 
of the extent to which variables con- 
tribute to judgment. 

The relative weights presented on 
page 120 in Hoffman’s article may 
lead to a certain amount of misunder- 

‘standing about the ‘independent 
contribution” of a variable in the 
judgment process. Before discussing 
this point it is necessary to establish 
what is meant by the term ‘‘inde- 
pendent contribution” of a variable. 

Consider a set of variables, X, 

Xe,***X, which all have mean 
values equal to 0. The independent 
contribution to prediction of Y of a 
single predictor, say X,, refers to the 
amount of predictive efficiency that 
the residual vector E in the vector X, 
can make when predicting the crite- 
rion Y, where the residual E refers to 
the error remaining when X, is pre- 
dicted from a least squares combina- 
tion of Xo, X3,°° + Xx. 
That is, if: 


X= weXetwsXst - +> +UX tL 
1 The research reported in this paper was 
sponsored by Personnel Laboratory, Wright 
Air Development Division, under Research 
and Development Project 7719, Task 17112. 
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where values for w; (¢=2, 
“least squares’ 


-++k) are 
coefficients; and if: 


Y=),E+G 


’ 


where 06; is determined by least 
squares and G is the error remaining 
when Y¥ is predicted from E, then the 
idea of independent contribution 
refers to the extent to which the 
residual E can account for the crite- 
rion Y. Frequently the term inde- 
pendent contribution refers to the 
proportion of the total variance of Y 
that is accounted for by the residual 
in X}. 

A regression coefficient reflects the 
value of the independent contribution 
only when the regression coefficient 
equals zero—and then, of course, 
there is no independent contribution. 
When a particular regression coeff- 
cient is different from zero, very little 
can be said about the independent 
contribution that the particular vari- 
able associated with the coefficient 
makes toward prediction of the inde- 
pendent variable. 

The concept of relative weight 
might lead to some confusion about 
the independent contribution of the 
corresponding predictor. Not only 
does it seem difficult to attach mean- 
ing to positive nonzero relative 
weights but it seems particularly 
difficult to interpret negative relative 
weights. 

Consider first the specific example 
that Hoffman presents on page 122, 
in which it is assumed that 7; =.400, 
ro2=.000, r12=.707. The solution of 
the matrix is indicated to yield 
Bo. =.800, Bor = — .566, Ro.12 = .566, 
and squaring the value of Roz we 
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obtain R%o.12=.32. Since the square 
of the correlation of a predictor with 
the judgment criterion indicates the 
proportion of variance accounted for 
when no other variables are con- 
sidered, we can observe that the 
proportion of variance accounted for 
by Predictor Number 1 is .16. It is 
quite apparent that the second pre- 
dictor, when predicting alone, ac- 
counts for no variance in the crite- 
rion. However, let us see what its 
independent contribution is. The 
R*) 42 of the least-squares combina- 
tion is .32; therefore, even though 
the second predictor accounts for no 
variance when predicting alone, its 
independent'contribution is equal to 
16% (.32—.16) of the total criterion 
variance. From this it becomes ap- 
parent that, whereas the first pre- 
dictor when used alone can account 
for only 16% of the total criterion 
variance, the use of the second pre- 
dictor provides an additional 16% of 
the variance. Furthermore, we can 
see that the independent contribu- 
tion of Predictor Number 1, is 32% 
(.32—.0) of the total criterion vari- 
ance. 

Several more examples are pre- 
sented in Table 1. The columns of 
Table 1 are defined as: 


TABI 


EXAMPLES OF SOLUTIONS TO PRE 
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roi=correlation between Pre- 
dictor 1 and the criterion 
ro2=correlation between Pre- 
dictor 2 and the criterion 
’i2=correlation between Pre- 
dictor 1 and Predictor 2 
= standardized regression co- 
efficient for Predictor 1 
= standardized regression co- 
efficient for Predictor 2 
¢=squared multiple correla- 
tion resulting from predic- 
tion by Variables 1 and 2 
261.2= R%o.12—r'%o2 =the independ- 
ent contribution of Pre- 
dictor 1 
= R*o.12—1°9, =the independ- 
ent contribution of Pre- 
dictor 2 


R*o2.1 


Wo1 and wWo2=relative weight for 
predictors (see Hoffman, 
1960, p. 122) 


Table 1 reveals some of the diffi- 
culty of using the idea of relative 
weight in the interpretation of con- 
tributions of individual variables. It 
can be observed, for example, that 
the relative weights in several differ- 
ent problems can be identical while 
the independent contributions can be 
quite different. Evidently, the con- 
cept of relative weight will not pro- 


x 1 


DICTOR REGRESSION PROBLEMS 





ed 
Example 


No. 


.000 
.000 
.000 
.000 
.000 
.700 
.800 
.000 
.000 
.000 


CONIAMNE WN 


_ 
o 


| 


Relative 
weights 


Independent 
contributions | 


02.1 Woi Wo2 


.000 
.000 
.000 


.320 

.162 | 

.646 

.000 

.961 

.850 | 

.760 

.000 | 

.000 

.000 | 1.000 


.000 
.000 
.000 
.000 
.000 
.235 
.053 
.000 
.000 
.000 
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The term “independent,” like so 
many others in the jargon, is cursed 
by having acquired a number of 
meanings. Perhaps its usage could be 
restricted to imply only experimental 
independence, and the term ‘“‘orthog- 
onal” to convey the notion of sta- 
tistical independence. Then both 
Ward (1962) and I could be wrong 
together. But as things stand, I be- 
lieve that the differences between us 
are mostly semantic, and therefore 
minor. 

‘Ward's “independent contribu- 
tion” the proportion of 
variance in the criterion attributed 
to the residual in a predictor, X,, 
after variance to X, 
other predictors is removed. 
coefficients are heavily 
upon the interrelations 
variables included for analysis. 


indicates 


and 
Such 


dependent 


common 


among the 
Pre- 
jsely. the indeneadent contributios 
cisely, the independent contribution 
of X, will necessarily be reduced by 
another predictor, X2, and by an 
amount equal to the variance com- 
mon to X,, Xo, and the criterion, X 


e 

Let roa.) be the independent con- 

tribution of X, before the inclusion of 

Xo and r.1.9) be 
1.2 


its reduced inde- 


pendent contribution. Then if we 
express the amount of this reduction 
by 6 and the percent reduction by 
ff 0 faite! 


/ 


, it can be shown that: 
G= Potro — Ro.2) 


. Poi tr7o2— Ro01,2) 
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It follows that although X, be a 
very satisfactory test of high validity, 
it will yield no independent contribu- 
tion if all of its valid variance may be 
predicted from a linear combination 
of a set of other predictors. ‘This 
holds the set of 
other predictors contains no variable 
which is individually as predictive as 
X,. Despite this limitation, coeffi- 
cients of independent contribution 
are quite useful, especially in empiri- 
cal prediction studies. 


of course even if 


The problem of assessing the rela- 
tive contribution of a variable, i.e., 
the relative importance of that varia- 
able as compared with others in- 
cluded with it in the same set, is 
different from the problem of predic- 
tion. It is different the 
primary concern is that of determin- 


because 


ing some mathematical representa- 
tion importance. This 
may be achieved through some kind 
of partitioning of the criterion vari- 


ince. 


f relative 


( 
1 
i 


‘1 he Val ian Se 
may be 


of predicted scores 
\ partitioned in many ways, 
but few are psychologically meaning- 
ful. Apportioning it among beta 
coefficients or squared beta coeffi- 
cients is not meaningful, since not all 
of the predictable variance is ac- 
counted for. Thus, in the two- 
McNemar’s 
variance of 
may be ex- 


predictor case, using 
(1955) notation, the 
predicted Oz"; 
pressed as follows: 


0°72’, = Bot’ st 282Bsres 


scores 


When 7r2;#0, the squared betas 
cannot account for the predictable 
criterion exclusively in 
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terms of the independent contribu- 
tions of the predictors for the simple 
reason that there exists a joint con- 
tribution as well. This joint contri- 
bution may be more than modest, 
particularly where the number of 
predictors is large and their inter- 
correlations at least moderate. Beta 
coefficients are therefore inadequate 
simply because no linear combina- 
tion of beta coefficients or of their 
squares exists which will unambigu- 
ously account for the predictable 
variance of judgments. The same 
may be said of Ward's independent 
contribution coefficients. The only 
exception is the special case in which 
the predictors are orthogonal. On 
the other hand, relative weight: 
defined as: 


Boi oi 


R*o.12 sock 


[1] 


provides a means of portraying the 
relative contributions of each of the 
predictors such that a simple sum of 
them accounts entirely and unam- 
biguously for the predictable vari- 
ance. For acontrary point of view on 
this problem the reader may wish to 
refer to Ezekiel (1930). 

The concept of “relative weight” 
was developed to provide a means by 
which the cognitive processes of 
clinicians (and, for that matter, any- 
one making judgments or decisions) 
might be described. It should be 
noted in this connection that the 
problem of describing the judgment 
process differs from the problem of 
prediction in another way. Judgment 
studies of the type described in my 
previous Psychological Bulletin ar- 
ticle (Hoffman, 1960) deal with a 
system of variables which is finite or 
“closed.’’ Since in the experimental 
arrangement by which the judgments 
are obtained only known quantitative 


Wei 


information is available to the judge, 
the criterion variance (variance of 
judgments) must be completely ac- 
counted for by two factors: one of 
these involves some combination of 
the predictor information, not neces- 
sarily linear, perhaps quite complex; 
the other factor is chance. This being 
the case, it is meaningful to speak of 
the possibility of measuring the de- 
gree to which the criterion variance 
may be accounted for by the relation 
of one variable to the others avail- 
able to the judge, i.e., within the 
system but completely independent 
of external variables which might be 
thrown into the regression analysis 
at will. 

A fair test of a coefficient which 
presumably reflects the relative con- 
tribution of a predictor in the judg- 
ment process is one which compares 
the value of the obtained coefficient 
when the predictor is a member of the 
set available to the judge with its 
value when the predictor is absent 
but yet included in the multiple 
regression analysis. A predictor 
available to the judge and “‘used”’ by 
him should be capable of being de- 
scribed by a coefficient which has at 
least a moderate value. When this 
predictor is experimentally absent 
from the judgment situation, the 
value of the coefficient should drop 
to a chance level. A poor (in this 
sense) type of coefficient would be 
one which is affected little by this 
kind of manipulation. : 

The coefficient which Ward refers 
to as independent contribution will 
not ordinarily satisfy this test. The 
introduction of an external predictor 
into such a closed system reduces the 
independent the 
internal predictors since variance 
which is common to an _ original 
predictor, an external predictor, and 


contributions of 
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the criterion is subtracted. Not 
everything is bad, however. The 
independent contribution of such an 
external predictor may be expected 
to be no greater than chance, and 
this will assuredly be reflected in the 
near-zero values of the coefficient. 
But the values of the coefficients for 
the original predictors will neverthe- 
less be determined by the interrela- 
tionships among the variables in- 
cluded in the analysis. The coeff- 
cient of independent contribution is 
therefore unsuitable for this pur puse. 
Beta coefficients are likewise affected 
by such manipulations although not 
to such a great extent. On the other 
hand, relative weights pass this test 
with flying colors, as will be shown in 
a forthcoming article. 

Another point raised by Ward has 
to do with the relationship between 
relative weights and coefficients of 
independent contribution. Although 
the concepts are different, I do not 
completely agree that no relation- 
ship exists between relative weights 
and independent contributions. It 
may be shown, for example, that in 
the case of two predictors, multiply- 
ing the ratio of the two independent 
contributions by the ratio of the 
corresponding squared validity co- 
efficients yields the ratio of the 
squared relative weights. That is: 


9 ° 
r°o(1.2) 7° 


ro2 


It is of interest also to note that the 
ratio of the independent contribu- 
tions is equal to the ratio of the 
squared beta coefficients. Multiply- 
ing Equation 2 by 8*o:/8?o2 we obtain: 


* ae ° 2 2 
w*oB"o1 r°oc.2) 101801 [3] 


9 9 9 9 9 
w*o28"o2 = F"0(2.1)  1°028"o2 


and since, from Equation 1: 


2 2 = apt 4 
r oi on & oR 0(123---k) 


it follows that: 


7701.2) B*o. 
A 
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In contrast, relative weights may 
be said to represent a product of two 
proportions: the first is the propor- 
tion that the independent contribu- 
tion bears to the residual of the 
predictor when the joint effects of 
the other predictors have been re- 
moved. This term is identically 
equal to the squared beta coefficient. 
The second term is the proportion of 
total predictable variance in the 
criterion which is common to the 
predictor in question. 

To conclude, my usage of the 
phrase independent contribution is 
intended to connote that the variance 
of predicted scores may be success- 
fully partitioned into a simple sum of 
ingredients, each referring to a spe- 
cific predictor and each being inde- 
pendent of any joint effect or inter- 
action. What Ward means by the 
term independent contribution is 
more ordinarily known as the “part 
correlation” or, actually, the square 
of the part correlation (cf. DuBois, 
1957). Using the term in Ward’s 
sense, I thoroughly agree with him 
that the concept of relative weight 
provides little information about the 
independent contribution of a pre- 
dictor. 
vide 


It was not intended to pro- 
such information. It is also 
correct that the concept of independ- 
ent contribution provides little in- 
formation about the contribution of 
a predictor relative to the contribu- 
tions of other predictors in a given 
set, nor is it intended as a method for 
assessing this aspect of the judgment 
process. 
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ERRATA 
In the article by P. L. Broadhurst and J. L. Jinks in the September 1961 
issue, the second equation at the top of the second column on page 338 should 
read: 


{h] = F,—4F,—3Pi-— 1P,+2B,+2B, 


The first equation in the second column on page 353 should read: 


(h] —[i] -_ F,— 1 P,+P.) 


In the article by Charles S. Morrill in the September 1961 issue, the refer- 
ences by R. E. Silverman were published by the United States Naval Train- 
ing Device Center. The reference by N. A. Crowder, 1959b, was published in 
1955. 

In the article by Mark R. Rosenzweig in the September 1961 issue, the 
quotation from Ades and Brookhart on page 384 should read as follows: 
‘that the inferior colliculus with its strong commissural connections and 
connections to efferent [not afferent] mechanisms may be the principal device 
responsible for localization’’ (1950, p. 203). 
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